Proposed Knowledge Units for Programming Languages
for Curriculum 2001

as formulated by the Programming Language KFG

[PL] Programming Languages

The Programming Language Knowledge Area Focus Group was formed less than two weeks before this report was due, so the report presented here is preliminary. We have had extensive discussion of the topics included here, but wish to present our ideas to a larger group of programming languages experts to get feedback on our ideas. We began with the knowledge units of CC '91, but made extensive changes to update the material and change the balance of material.

Contributing members of this knowledge area focus group are Kim Bruce (Williams College), Benjamin Goldberg (NYU), Chris Haynes (Indiana U.), Gary Leavens (Iowa State U.), and John Mitchell (Stanford U.).

The main changes of this report from CC '91 are

Shifting logic programming from core to an intermediate or advanced topic (we have not yet included a KU for this, however, as we have only focussed on those topics having core material).
Significantly cutting back the material on automata, regular expressions, and context-free grammars to the material most needed for programming languages. We anticipate that some of the material omitted here will be picked up in the algorithms [AL] or foundations [FO] KU's.
Increasing the attention to modules and information hiding.
Including more detail in most knowledge units, esp. those involving object-oriented languages.
Rearranging and reordering KU's to increase their coherence.

The changes we have made resulted in a decrease of roughly 11 hours from the KU's specified in CC '91.

We have included mainly core and intermediate level topics in the KU's. A few advanced topics occur, but we made no attempt to be complete in these (for instance we do not yet have the intermediate / advanced KU's necessary to support a compiler course). We did label the sections of each KU that belong in each category. The following is a short listing of the KU topics:

PL1: History and Overview of Programming Languages (1.5/.5)

PL2: Virtual Machines (1)

PL3: Formal Languages and Language Analysis (1.5/.5)

PL4: Language Translation Systems (1.5/1/1)

PL5: Types (5/0/1)

PL6: Control of execution (3/1)

PL7: Declarations and Modules (5)

PL8: Run-time Storage Management (3)

PL9: Programming Language Semantics (2/0/3)

PL10: Functional Programming Paradigms (5)

PL11: Object-Oriented Programming Paradigm (4/0/1)

PL12: Distributed and Parallel Programming Constructs (3/2/2)

The numbers after each KU titles are the number of hours devoted to the KU. If there is only a single number then all topics are at the core level. If there are multiple numbers then the first represents core hours, the second represents intermediate, and the last (if present) represents advanced hours. We were inconsistent in listing intermediate and advanced topics (and in fact weeded out most advanced topics). Most remaining are there because they were felt by at least one member of the committee to be a candidate for moving to a stronger requirement.

We included justification for most of the KU's, but because of a lack of time, were not able to include them for all KU's. We will add these later. We also hope to have the benefit of comments from a wider group by the time we prepare the next version of this report.

The three processes of theory, abstraction, and design are well represented here. PL3 (formal languages) and PL9 (formal semantics) provide the strongest representation of theory in the curriculum, though aspects of theory also show up in discussions of type checking in PL5. Abstraction is very well represented in these KU's as programming languages provide abstractions for controlling computations and the representation of information. PL2 (Virtual machines), PL5 (types), PL6 (Control of Execution), PL7 (Declarations and Modules), as well as sections of PL10 (Functional Paradigm), PL11 (Object-oriented Paradigm) all present a heavy dose of abstraction. Design shows up in examination of trade-offs in choosing language constructs as well as PL4 (Language Translation Systems), PL8 (Run-time Storage Management), and PL12 (Distributed and Parallel Programming Constructs). Design also shows up in discussions of various programming language paradigms.

The knowledge units presented here do not have any explicit requirements for mathematics and physical sciences, though we expect that discrete mathematics will be important for the prerequisites of many topics. We have not yet prepared model courses with these KU's. We have not yet completely fixed the lists of prerequisites and requisites, so many will be revised later.

As a last thought before presenting the programming language knowledge units: We note that examination of the use of programming languages over the last 30 to 40 years has made it absolutely clear that trends in programming language usage change dramatically over time. Mainstream languages have moved from FORTRAN and COBOL in the 60's to PL/I, Pascal and C in the '70's (and beyond for the last two) to Smalltalk and C++ in the '80's (and beyond) to Java in the '90's - and that is ignoring large programming subcultures using languages like LISP/Scheme, APL, PROLOG, ML, Visual Basic, Perl, etc. Clearly no programmer can expect to use the same language or even the same language family/paradigm during her career. An understanding of the core concepts of programming languages and different programming language paradigms will better enable programmers to keep up with language changes that will occur over their careers.

PL1: History and Overview of Programming Languages (1.5/.5)

A brief historical survey of major early developments in programming languages, beginning with the evolution of procedural high-level languages. An overview of contemporary programming paradigms and their related languages, including procedural, object-oriented, functional, logic, and parallel programming. Language families and trends.

Recurring Concepts: evolution, conceptual and formal models, complexity of large problems, efficiency, tradeoffs and consequences, levels of abstraction, security

Core Lecture Topics: (One and a half hours minimum)

Early languages; FORTRAN, ALGOL, COBOL, LISP, BASIC.
The evolution of procedural languages (the ALGOL 60, PL/1, ALGOL 68, Pascal, C, Euclid, Modula-2, Ada83, and Ada95 chain of development)
Imperative, algorithmic

Procedural, structured paradigm and languages (Pascal, C, Modula-2, and Ada83)
Object-oriented paradigm and languages: (Simula, Smalltalk, C++, Java, Eiffel, Modula-3, Oberon, CLOS, Dylan)

Mostly-functional, algorithmic, higher-order paradigm and higher-order languages with eager evaluation (Common Lisp, Scheme, ML, APL)
Purely-functional, algorithmic paradigm: (Miranda, Haskell, Clean),
Declarative (non-algorithmic) Languages: Logic programming languages (e.g., Prolog)

Intermediate Lecture Topics (half hour minimum)

Parallel programming paradigms: (CSP, Occam, Turing Plus, SR, Emerald, Java threads, Linda, Ada, Orca)
Scripting paradigm (UNIX Shell, Perl, Tcl, Python, Visual BASIC)

Justification for core sections

The core sections of this KU present the student with an historical perspective on the evolution of the important modern programming languages in the commercial and academic arenas. Of particular importance is the emphasis on identifying families of programming languages and their relationships. This facilitates the student being able to identify substantial similarities among languages in a family, aiding in the student's learning of new languages in a given paradigm (e.g. C++ and Java).

Prerequisites: PF

PL2: Virtual Machines (1)

Actual vs. virtual computers. The understanding of programming languages in terms of the corresponding virtual machines (regardless of the actual architecture on which they run). Language translation understood conceptually as an implementation on a virtual machine, followed by a sequence of translations to simpler core languages through a hierarchy of virtual computers.

Recurring Concepts: binding, conceptual and formal models, levels of abstraction.

Core Lecture Topics: (one hour minimum)

What is a virtual machine? (examples: Java Virtual Machine, lambda calculus, etc.)
Hierarchy of virtual machines presented to the user through the program, the translator, the operating system, etc.

Justification for core sections

This unit is important because emphasizes to the student the architectural independence of good programming language design. Using a virtual machine, or a hierarchy of virtual machines, language features - even the implementation of these features - can be compared and evaluated using a machine model that is substantially simpler than actual processors. This greatly facilitates the understanding of the dynamic behavior of a program and proving properties about such behavior. In certain cases, the use of a virtual machine, such as the Java Virtual Machine, facilitates implementations across a wide range of physical machines.

Prerequisites:

PL3: Formal Languages and Language Analysis (1.5/.5)

Application of regular expressions and context free languages as formal descriptions of language syntax and their use in programming language analysis.

Recurring Concepts: conceptual and formal models, levels of abstraction

Core Lecture Topics: (one and a half hour minimum)

Overview of regular expressions, context-free grammars (and syntax diagrams), and their use in specifying and implementing programming languages.
Context-free grammars and parse-trees, ambiguous grammars.

Intermediate Lecture Topics: (one half hour minimum)

Applications of regular expressions in lexical analysis.

Justification for core sections

The student should be made aware of how the syntax of a programming language is formally specified, thus enabling the programming language designer to communicate the language syntax to the users and language implementers. Being able to comprehend regular expressions and grammars, such as the Backus-Naur Form (BNF), is crucial for any programmer attempting to learn a new language. Furthermore, the students should understand that there exists a direct relationship between the formal specification of syntax and the structures created by the compiler during parsing. This provides the foundation for automatic generation of lexers and parsers in modern compilers.

Prerequisites: AL5

PL4: Language Translation Systems (1.5/1/1)

An overview of the language translation process, encompassing the range from compilers to interpreters. The focus is on the architecture of compilers.

Recurring Concepts: binding, conceptual and formal models, consistency and completeness, levels of abstraction, ordering in space, ordering in time, efficiency, tradeoffs and consequences

Core Lecture Topics: (one and a half hour minimum)

Comparison of pure interpreters vs. compilers; operation and use
Architecture of a compiler (lexical analysis phase, parsing, symbol table, code generation, optimization)

Intermediate Lecture Topics (one hour minimum)

Parsing: concrete and abstract syntax, abstract syntax trees
Code generation by tree walking
Optimization techniques

Advanced Lecture Topics (one hour minimum)

Application of regular expressions in table-driven lexical scanners
Application of cfg's in table-driven and recursive descent parsing

Justification for core sections

The core components of this unit present an overview of how languages are implemented. Although the precise algorithms used in compilers and interpreters can be considered more advanced topics, it is critical that the student understand how the various aspects of a language, such the syntax, type system, etc., correspond to components of the implementation, such as the parser, type checker, etc. Without this view, the student will be unable to understand the extent to which decisions made by the language designer have an effect on the implementation of the language.

Suggested Laboratories:

(open) develop a simple parser (e.g., recursive descent) for arithmetic expression that returns an expression tree.
(closed) Use a compiler-generator tool to specify and run a finite state automaton that will accept a small part of the lexical grammar of some programming language.
Design and exercise a table-driven parser for a simple context-free language.

Prerequisites: AL6, AR3, PL2, PL3

PL5: Types (5/0/1)

Models and descriptions of data. Elementary and structured data types. Type checking and type inference. Polymorphism. User-defined and abstract data types.

Recurring Concepts: binding, security, efficiency, tradeoffs and consequences, reuse, conceptual and formal models, levels of abstraction, ordering in space.

Core Lecture Topics: (five hours minimum)

Data type as set of values with set of operations

Elementary data types: Booleans, characters, integers, floating-point numbers.
Structured data types

product types: arrays, strings, records (structs in C/C++)
coproduct types: unions, variant records,
algebraic types (as in ML, Miranda, Haskell)
recursive types: representation by pointers or references
arrow types: function and procedure types
parameterized types

Type checking

Goals

detect errors; preserve intended meaning of program operations
representation independence

Static vs. Dynamic type systems

dynamic strong type checking in Smalltalk and Lisp/Scheme
tradeoff between flexibility and catching errors early

Basic type-checking (without polymorphism)

explicit type checking with explicit type declarations
type inference

User-defined types

type abbreviations (like typedef in C/C++, type in ML)
ADT's and preview of encapsulation via modules
type equality: structural type equivalence (as in ML) vs. name equivalence (as in Java)

Parametric polymorphism (Generics)

intuition and applications: polymorphic operations on lists, other data structures
comparison of implicit (ML, Scheme) and explicit (Ada generics, C++ templates) polymorphism
comparison with ad hoc polymorphism (typecase, instanceof in Java), static overloading (as in Ada83, Haskell), and dispatch in OO languages (and multiple dispatch in CLOS, Cecil)

Subtype polymorphism

structural subtyping rules for records, variants, functions, objects.
type casts (downcast vs. upcast and safety)

Advanced Lecture Topics (one hour minimum)

Type-checking algorithms

explicit algorithm & implicit algorithm w/ parametric polymorphism (Hindley/Milner, as in ML)
algorithm for checking subtyping in an explicit language

Justification for core sections

Progress in type systems is at the core of many advances in programming languages. Types are an important source of abstraction in helping the programmer think about problems in a higher-level way. Advances in type-checking have made the use of stricter type systems a greater help without the cost in expressiveness of older systems. Topics of structural vs. name equivalence of types, parametric polymorphism, and subtype polymorphism are issues that programmers need to understand in working with modern programming languages.

Suggested Laboratories:

(closed) Find a seeded type error in a dynamically-typed language.
(open) Develop the same program (e.g., an interpreter) in a dynamically typed language (e.g., Scheme) and in a statically typed language (e.g., ML). Compare the development efforts and the resulting code.
(open) Implement some generic collection ADT (e.g., set[T], bag[T], etc.) using a recursive representation (e.g., linked lists). Student gains experience with language support for structured types, recursive types, ADT's, specifications and polymorphism.

Prerequisites: PF2, PF3, PF5, PL2

PL6: Control of execution (3/1)

Flow of control associated with evaluating expressions and executing statements. User-defined expressions and statements.

Recurring Concepts: levels of abstraction, ordering in time, security, efficiency, tradeoffs and consequences.

Core Lecture Topics: (three hours minimum)

Expressions, order of evaluation of sub-expressions

Reasons for underspecifying evaluation order: reordering can improve efficiency, allow parallelism
side effects and possible non-termination prevent reordering
conditional expressions: some subexpressions are not evaluated; strictness
functions as abstraction of expressions

Statements

assignment, sequencing (S1;S2), function/procedure calls, goto
conditional and case/switch, loops (while-do, do-until), break and continue
procedures as abstractions of statements
iterators as abstraction of loop structure over data structures

Exceptions and exception handling

try-statements in Ada and Java
termination model vs. resumption model

Intermediate Lecture Topics: (1 hour minimum)

parallel composition (S1||S2)
Functions delay evaluation

closures: lambda in Scheme, fun in ML, blocks in Smalltalk
lazy languages and user-defined control constructs (contrast Haskell and Smalltalk)

Justification for core sections

This KU covers basic components of programming languages, emphasizing subtle or difficult issues that may cause problems for programmers. For example, issues of underspecified evaluation order in expressions with side effects have caused serious difficulties in programs. Most of the standard expressions and statements will be covered only in passing, emphasis will be on the use of programmer created abstractions (functions, procedures, and iterators) and especially newer or more unfamiliar constructs like iterators and exception handlers that are important parts of common modern languages.

Prerequisites: PL2

PL7: Declarations and Modules (5)

Methods of sharing and restricting access to information in programming languages.

Recurring Concepts: binding, complexity of large problems, levels of abstraction, ordering in space, security, reuse, and evolution.

Core Lecture Topics: (five hours minimum)

Declarations

binding and allocation: aliases, constants vs. variables
visibility of declarations, static vs. dynamic scope
lifetimes (impact of garbage collection. and closures)

Parameterization mechanisms;

parameter-passing: reference, copy (value, result, and value-result), name, sharing (as in Java) and correspondence to declaration forms.
type parameterization: generics or templates (as in Ada, C++), implicit polymorphism (as in ML, Haskell, and Scheme) [overlap w/PL5.4 to be resolved]

Mechanisms for sharing and restricting visibility of declarations (blocks in ALGOL-like languages, modules in Ada and ML, classes and subclasses in OO languages, packages in Java)
Use of modules to enforce information hiding for data abstractions and code reuse in libraries

information hiding and abstraction boundaries
separation of interface and implementation; existential type binding
aliasing and how it may violate information hiding
separate compilation and linking (interface vs. implementation)
information hiding vs. inheritance (protected in C++ and Java)

Justification for core sections

Declaration and scoping issues and problems need to be understood by programmer in order to understand when names are visible and objects they refer to are accessible. Understanding parameter passing modes is essential in order to understand the difference between, for example, parameter passing in Java, Pascal/Ada, and C or C++. Languages like C and Java offer only one parameter passing mode, but other languages offer different or multiple modes. Programmers need to understand the differences between these to avoid confusion. Finally, with the increasing importance of libraries and the general concept of code reuse, a deeper understanding of the purpose of modules and current manifestations of them in programming languages is essential. In particular, information hiding as a way of forming abstraction barriers is key to enabling reuse.

Suggested Laboratories:

(closed) Exercise the same program in languages with dynamic and static scoping, and/or with different parameter mechanisms. Explain the different effects.
(open) Write a large program, in teams, that uses several modules.

Prerequisites: PL2, PL5

PL8: Run-time Storage Management (3)

Allocation, recovery, and reuse of storage during program execution.

Recurring Concepts: binding, levels of abstraction, ordering in space, reuse, security.

Core Lecture Topics: (three hours minimum)

Static allocation (as in Fortran or C static)
Stack-based allocation and its relationship with recursion
Heap-based allocation
Garbage collection - include benefits and problems of each technique

Explicit allocation/deallocation (as in C, C++)
Reference counting
Overview of garbage collection algorithms (mark and sweep and/or copying)

Justification for core sections

These sections are critical for having the student understand the effect of language design on implementation, as well as the effect of programming techniques on efficiency and memory usage. With the advent of the first truly popular garbage collected language, Java, it is increasingly important for the student to understand the implementation issues and program correctness issues (explicit vs. automatic allocation/deallocation) involved in choosing a language based on a particular memory allocation/deallocation model.

Prerequisites: PF4, PF5, PL2

PL9: Programming Language Semantics (2/0/3)

Use of formal and informal models to describe programming language semantics.

Recurring concepts: conceptual and formal models, levels of abstraction.

Core Lecture Topics: (2 hours minimum)

Informal semantics (e.g., the ALGOL 60 or Scheme reports)
Formal semantics

kinds of formal semantics: operational (natural, SOS), axiomatic (Hoare logics), denotational (Domains, functions)
formal operational semantics of some small language (like PCF ...).
Benefits of formal semantics

Advanced Lecture Topics: (3 hours minimum)

Denotational Semantics.

Domains, fixed points
Denotational semantics of some language features, comparison to their operational semantics

Axiomatic Semantics

Hoare triples
Weakest pre-conditions

Justification for core sections

Students need to know how languages are defined so that they can read language reference manuals, and so that they can more quickly learn new languages. Students are also likely to design something resembling a programming language eventually (such as a class library or user interface extension mechanism), and thus should know how they can do this carefully. Operational semantics is the easiest formalism to teach in this area, and also applies most easily to concurrent programming.

Suggested Laboratories:

(open) Given a semantics of a simple language (e.g., PCF), write an interpreter which implements the semantics.

Prerequisites: PL2, PL3, PL6

PL10: Functional Programming Paradigms (5)

The functional programming paradigm. Advantages and disadvantages. Recursion over recursive data structures. Functions as data. Amortized complexity.

Recurring Concepts: conceptual and formal models, levels of abstraction, trade-offs and consequences, efficiency, reuse, security

Core Lecture Topics: (5 hours minimum)

Overview and motivation

problems with reasoning about assignment in presence of aliasing vs. referential transparency when there is no mutation
need for copying when have mutation vs. sharing when no mutation

Recursion over lists, natural numbers, trees, and generalization to other recursively-defined data (using some language like Scheme, ML, Miranda, or Haskell)
Pragmatics: debugging by divide and conquer; persistency of data structures (the old version is still available when the new version is produced).
Amortized efficiency for functional data structures (e.g., amortized queues) and comparison to imperative data structures
Closures, and uses of functions as data (e.g., infinite sets, streams)

Justification for core sections

Functional programming is the main competing alternative to imperative programming. It is important both as a technique that is useful for doing certain kinds of work (e.g., prototyping language designs), and as a connection to other parts of computing. For example, functional programming techniques are used in program specification, theorem proving, and are directly related to mathematics. Moreover, functional programming is important because it teaches students new ways to think about programming, and gives them ideas on how to combine and abstract program parts that are very difficult to see in other paradigms. Recursion is a fundamental technique for functional programming, since it corresponds to recursively described data. The pragmatic features are important for students to understand how to write programs in a functional style (which is important for learning how to specify programs, for example). Knowing something about efficiency in a functional setting prevents students from using these techniques inappropriately, ties the subject to data structure and algorithm analysis, and helps students see the fundamental tradeoffs. The use of higher-order functions is the definition of this style, and important for abstraction and reuse.

Suggested Laboratories:

(open) Write a denotational or operational-style interpreter for a small programming language. Students may also modify such an interpreter to experiment with choices in parameter-passing and scoping.
(open) See the programming exercises in Abelson and Sussman's book...

Prerequisites: PF4, PF5, PL2

PL11: Object-Oriented Programming Paradigm (4/0/1)

The Object-Oriented (OO) paradigm. Advantages and disadvantages. Types, classes and objects; subtyping and inheritance. Type checking.

Recurring Concepts: conceptual and formal models, levels of abstraction, trade-offs and consequences, reuse, security

Core Lecture Topics: (4 hours minimum)

Overview and motivation

problems with change in stepwise refinement method
difficulty of code reuse
evolution of programs and the need to reflect incremental changes in the program structure
terms: ADT, type, class, object (instance), method, object's instance protocol (method interface), self/this, super

Mechanisms for defining classes and instances in some OO language (e.g., Smalltalk, Java, C++, Eiffel) and for defining interfaces in Java.
Object creation and initialization.
Inheritance and dynamic dispatch

single vs. multiple dispatch (as in CLOS, Dylan)
dynamic dispatch of methods, method overriding, and method inheritance (examples of method ping-ponging)

Sketch of run-time representation of objects and method tables, how it enables method dispatch.
Distinction between subtyping and inheritance

Advanced Lecture Topics: (1 hour minimum)

Advanced OO type problems

Need for sophisticated parametric polymorphism (e.g., F-bounded or equivalent)
Problems with binary methods

Justification for core sections

Object-oriented programming and object-oriented languages represent the main stream of programming, and programming language design. They also raise many interesting and confusing issues, which need to be discussed in relation to programming languages. Understanding of the basic terms and semantics of such languages is thus fundamental for practical programming, maintenance, and for further language design. Methods and inheritance, which define the object-oriented paradigm, are the key aspects of this understanding.

Suggested Laboratories:

(open) Write an interpreter for a small programming language, or a Turing machine, in an OO language, using the interpreter or visitor pattern.

Prerequisites: PF6, PL2, PL5.

PL12: Distributed and Parallel Programming Constructs (3/2/2)

Description of alternative realizations of parallel and distributed programming constructs.

Recurring Concepts: conceptual and formal models, consistency and completeness, efficiency, levels of abstraction.

Core Lecture Topics: (3 hours minimum)

Overview and motivation

Massive (exponential) computational cost of important problems
data-parallel model vs. explicit tasking models of programming
parallel vs. distributed computing (differences in granularity of parallelism and fault tolerance, physical distribution)

Communication primitives for tasking models with explicit communication (distributed programming)

message passing without linked replies (e.g., CSP, Occam, MPI, PVM)
remote procedure calls (e.g., Argus, SR)

Communication primitives for tasking models with shared memory

Semaphores and conditional critical regions
events (publish/subscribe)
threads and monitors (e.g., Ada, Java)

Intermediate Lecture Topics: (2 hours minimum)

Programming primitives for data-parallel models ( vector and data parallel, SIMD, machines)

parallel machine architectures
language extensions (plural data, compiler directives, Fortran 90, HPF, C*)
new languages (e.g., , ZPL, Data parallel C, NESL, Parlation)

Comparison of language features for parallel and distributed programming.

Advanced Lecture Topics: (two hours minimum)

Optimistic concurrency control (for tasking models with shared memory) vs. locking and transactions
Coordination languages (e.g., Linda)
Asynchronous remote procedure calls (pipes)
Other approaches

functional languages (e.g., Sisal, Erlang)
nondeterministic languages (e.g., Unity, Parlog)

Justification for core sections

Parallel and distributed programming are becoming quite common; for example, Java includes a locks and constructs that allow one to program monitors, and the Java Jini and RPC mechanisms allow one to do RPC. Most programming for window systems involves threads. Even businesses are using distributed programming extensively in CORBA and client-server contexts. Thus it is crucial that students see and understand the basic mechanisms found in such contexts: remote procedure calls (RPC) and monitors.

Suggested Laboratories:

(open) Develop a parallel program in Java.
(open) Write and measure some scientific program in HPF or some other data-parallel language.

Prerequisites: PF7, PL6

Proposed Knowledge Units for Programming Languages for Curriculum 2001

as formulated by the Programming Language KFG

Proposed Knowledge Units for Programming Languages
for Curriculum 2001