CS 334
Programming Languages
Spring 2002

Lecture 20

F-bounded vs "MyType" based languages

Both kinds of languages allow expression of all bad examples cited earlier - clone, equals, ColorCircle

F-bounded:

Not preserved under subclass.
Encoding requires extra type parameter in interface

Matching:

Preserved under subclass
Must distinguish exact vs. #-types. Can't send binary message to #-type.
If no occurrences of MyType then #-types give exactly same effect as subtyping.

Which is better seems to depend on taste - more experience necessary.

Last OOL Topic: Implementation issues

Implementation of OO languages is actually pretty straightforward if objects are implicit references. Recall that all object generated by a particular class have the same set of instance variables (i.e., same collection of slots) and the same methods. Thus can represent objects in memory as references to records, where all but the first slot hold the instance variables and the first slot holds a pointer to the virtual method table (sometimes called the vtable) for the methods of the class.

The virtual method table is a record with a slot for every method (private, protected, and public) of the class. Each slot holds the address of the code segment of the appropriate method body (or, if necessary, a closure).

In this way, if a message is sent to an object, it will indirect through the virtual method table to find the appropriate code to be executed (passing along the address of the record of instance variables to be used in the method body).

When a subclass is defined, slots for new instance variables are added at the end of the object (so old instance variables are at the same offset as with the superclass), and a new vtable is constructed by starting with a copy of the old vtable, adding new methods to the end, and replacing code pointers of overridden methods.

Note that when a method is called, no search for the appropriate method is necessary. The offset in the vtable for the method can be computed statically. The offsets for both instance variables and methods are calculated for the declared type, but the exact same offsets work for all objects generated from subclasses. Thus, method look up is the same whether the value of the expression is an object of the static type of the expression or a subtype.

Unfortunately, this does not work with interfaces, as classes implementing interfaces need not have methods in the same order as the interfaces. Thus methods must be looked up in the vtable when interfaces are used.

Notice also that if all objects are represented as implicit references, then there are no problems in assigning an object generated from a subclass to a variable declared to be of a type corresponding to a superclass. While the record corresponding to the object of the subclass may be larger, the reference to it is always of the same size! Thus fitting an object from a subclass into a slot designed for the superclass causes no problems in Eiffel or Java.

The same situation in C++ where the object is not held as a reference results in truncation of the object and replacement of the vtable by the vtable of the superclass!

In order to support type casts and reflection, Java (and other OOL's) keep a type descriptor in the vtable to identify the class of an object at run-time.

Implementation of multiple inheritance is much more worse than for single inheritance - resulting in significant complications and greater inefficiency of method look-up.

Evaluation of OOL's.

Pro's (e.g., with Eiffel and Java)

Good use of information hiding. Objects can hide their state.
Good support for reusability. Supports generics like Ada, run-time creation of objects (unlike Ada)
Support for inheritance and subtyping provides for reusability of code.

Con's

Loss of locality.
Type-checking too rigid, unsafe, or requires link time global analysis. Others require run-time checks.
Semantics of inheritance is very complex. Small changes in methods may make major changes in semantics of subclass. It appears you must know definition of methods in superclass in order to predict impact on changes in subclass. Makes provision of libraries more complex.
Weak or non-existent support of modules.

Eiffel also provides support for number of features of modern software engineering - e.g., assertions.

What will be impact of OOL's on programmers and computer science?

Soon will likely be more popular than procedural programming.

Many of the advantages claimed by proponents could be realized in Clu, Modula-2, or Ada (all available decade or more ago).

My advice: Specify carefully meaning of methods, avoid long inheritance chains, and be careful of interactions of methods.

When implement F-bounded polymorphism, Java could be a very successful compromise between flexibility and usefulness.

Semantics of programming languages

Give a brief survey of semantics specification methods here.

Operational
Axiomatic
Denotational

Operational Semantics

May have originated with idea that definition of language be an actual implementation. E.g. FORTRAN on IBM 704.

Can be too dependent on features of actual hardware. Hard to tell if other implementations define same language.

Now define abstract machine, and give translation of language onto abstract machine. Need only interpret that abstract machine on actual machine to get implementation.

Ex: Interpreters for PCF. Transformed a program into a "normal form" program (can't be further reduced). More complex with language with states.

Expressions reduce to pair (v,s), Commands reduce to new state, s.

E.g.

    (e1, ev, s) => (m, s')    (e2, ev, s') => (n, s'')
    ----------------------------------------------------
               (e1 + e2, ev, s) => (m+n, s'')

            (M, ev, s') => (v, s'')
    ----------------------------------------
    (X := M, ev, s) => (ev, s''[v/ev(X)])


    (fun(X).M, ev, s) => (< fun(X).M, ev >, s)

    (f,ev,s) => (<fun(X).M, ev'>, s')   (N,ev,s') => (v,s''),    
                (M, ev' [v/X], s'') => (v', s''' )
    ------------------------------------------------------------                  
                    (f(N), ev, s) => (v', s''' )

Type of semantics called "big-step" or natural op. semantics.

Text describes small-step operational semantics (though they do not separate environment and state -- fine as long as there is no aliasing).

(M, ev, s) => (M', ev', s')
--------------------------------------
(X := M, ev, s) => (X := M', ev', s')

(X := n, ev, s) => s'[n/X]   for n an integer.

Major difference is reduce expressions one step at a time. Can be shown to be equivalent. Some advantages to each approach. I prefer big-step for implementation as they are simpler, but there are some theoretical advantages to using small-step semantics in providing type-safety of languages.

Operational Semantics: Meaning of program is sequence of states that machine goes through in executing it - trace of execution. Essentially an interpreter for language.

Very useful for compiler writers since very low-level description.

Idea is abstract machine is simple enough that it is impossible to misunderstand its operation.

Axiomatic Semantics

No model of execution.

Definition tells what may be proved about programs. Associate axiom with each construct of language. Rules for composing pieces into more complex programs.

Meaning of construct is given in terms of assertions about computation state before and after execution.

General form:

			{P} statement {Q}

where P and Q are assertions.

Meaning is that if P is true before execution of statement and statement terminates, then Q must be true after termination.

Assignment axiom:

	{P [expression / id]} id := expression  {P}

e.g.

		{a+17 > 0} x := a+17 {x > 0}

		{x > 1} x := x - 1 {x > 0}

While rule:

	If {P & B} stats {P}, then {P} while B do stats {P & not B}

E.g. if P is an invariant of stats, then after execution of the loop, P will still be true but B will have failed.

Composition:

	If {P} S1 {Q}, {R} S2 {T}, and Q => R, 
		then {P} S1; S2 {T}

Conditional:

	If {P & B} S1 {Q}, {P & not B} S2 {Q}, 
			then {P} if B then S1 else S2 {Q}

Consequence:

	If P => Q, R => T, and {Q} S {R},
			then {P} S {T}

Prove program correct if show

	{Precondition} Prog {PostCondition}

Usually easiest to work backwards from Postcondition to Precondition.

Ex:

	{Precondition: exponent0 >= 0}
	base <- base0
	exponent <- exponent0
	ans <- 1
	while exponent > 0 do
		{assert:  ans * (base ** exponent) = base0 ** exponent0}
		{           & exponent >= 0}
		if odd(exponent) then
				ans<- ans*base
				exponent <- exponent - 1
			else
				base <- base * base
				exponent <- exponent div 2
		end if
	end while
	{Postcondition: exponent = 0}
	{               & ans = base0 ** exponent0}

Let us show that:

	P =  ans * (base ** exponent) = (base0 ** exponent0) & exponent >= 0

is an invariant assertion of the while loop.

The proof rule for a while loop is:

	If {P & B} S {P}  then  {P} While B do S {P & not-B}

We need to show P above is invariant (i.e., verify that {P & B} S {P}).

Thus we must show:

{P & exponent > 0}
if odd(exponent) then
                ans<- ans*base
                exponent <- exponent - 1
            else
                base <- base * base
                exponent <- exponent div 2
        end if
{P}

However, the if..then..else.. rule is:

    if {P & B} S1 {Q} and {P & not-B} S2 {Q} then 
                                    {P} if B then S1 else S2 {Q}.

Thus it will be sufficient if we can show

(1) {P & exponent > 0 & odd(exponent)}
            ans<- ans*base; exponent <- exponent - 1 {P}

and

(2) {P & exponent > 0 & not-odd(exponent)}
            base <- base * base; exponent <- exponent div 2 {P}

But these are now relatively straight-forward to show. We do (1) in detail and leave (2) as an exercise.

Recall the assignment axiom is {P[exp/X]} X := exp {P}.

If we push P "back" through the two assignment statements in (1), we get:

{P[ans*base/ans][exponent - 1/exponent]} 
                ans<- ans*base; exponent <- exponent - 1 {P}

But if we make these substitutions in P we get the precondition is:

    ans*base* (base ** (exponent - 1)) = base0 ** exponent0 
            & exponent - 1 >= 0

which can be rewritten using rules of exponents as:

    ans*(base ** exponent) = base0 ** exponent0 & exponent >= 1

Thus, by the assignment axiom (applied twice) we get

(3) {ans*(base**exponent) = base0**exponent0 & exponent >= 1}
            base <- base * base; exponent <- exponent div 2 {P}

Because we have the rule:

    If {R} S {Q} and R' => R  then {R'} S {Q}

To prove (1), all we have to do is show that

(3)     P & exponent > 0 & odd(exponent) => 
                    ans*(base ** exponent) = base0 ** exponent0 
                    & exponent >= 1

where P is

    ans*(base**exponent) = (base0**exponent0) & exponent >= 0.

Since ans * (base ** exponent) = (base0 ** exponent0) appears in both the hypothesis and the conclusion, there is no problem with that. The only difficult is to prove that exponent >= 1.

However exponent > 0 & odd(exponent) => exponent >= 1.

Thus (3) is true and hence (1) is true.

A similar proof shows that (2) is true, and hence that P truly is an invariant of the while loop!

Axiomatic semantics due to Floyd & Hoare, Dijkstra also major contributor. Used to define semantics of Pascal [Hoare & Wirth, 1973]

Too high level to be of much use to compiler writers.

Perfect for proving programs correct.

Denotational Semantics

Mathematical definition of meaning of programming constructs. Find denotation of syntactic elements.

E.g. (4 + 2), (12 - 6), and (2 * 3) all denote the same number.

Developed by Scott and Strachey, late 60's early 70's

Program is defined as a mathematical function from states to states. Use these functions to derive properties of programs (e.g. correctness, soundness of typing system, etc.)

Start with functions defined by simple statements and expressions, combine these to get meaning of more complex statements and programs.

Tiny

Syntactic Domains:

    I in Ide 
    E in NumExp 
    B in BoolExp
    C in Command

Formal grammar

    E ::= 0 | 1 | read | I | E1 + E2
    B ::= true | false | E1 = E2 | not B | fn x => E | E1 (E2)
    C ::= I := E | output E | if B then C1 else C2 | 
                while B do C |  C1; C2

Semantic Domains:

    State   =   Memory x Input x Output
    Memory  =   Ide -> [Value + {unbound}]
    Input   =   Value*
    Output  =   Value*
    Value   =   Nat + Bool + (Nat -> Value)

where Nat = {0, 1, 2, ...} is as defined above, and Bool = {true, false}

We assume that the following built-in functions are defined on the above domains:

    and : Bool x Bool -> Bool, 
    if...then...else... : Bool x Y x Y -> Y + {error} 
                                 for any semantic domain Y,     
    =  : Value x Value -> Bool,
    hd : Value* -> Value, 
    tl : Value* -> Value*,

where each of these has the obvious meaning.

In the denotational semantics given below, we use s or (m, i, o) for a typical element of State, m for Memory, i for Input, o for Output, and v for Value.

Denotational Definitions:

We wish to define:

E: NumExp -> denotations (or meanings) of numeric expressions
B: BoolExp -> denotations (or meanings) of Boolean expressions
C: Command -> denotations of commands

Note that in the first instance the valuation of an expression may

result in an error,
depend on the state, or
cause a side effect.

Therefore we will let E have the following functionality:

        E : NumExp -> [State -> [[Value x State] + {error}]]

where we write

  [[E]]s = (v,s') where v is E's value in s & s' is the state 
                               after evaluation of E.
      or = error, if an error occurs

B and C are defined similarly. The three functions are defined below.

Define E : NumExp -> [State -> [[Value x State] + {error}]] by:

    E [[0]]s = (0,s)

    E [[1]]s = (1,s)

    E [[read]](m, i, o) = if (empty i) then error, 
                                        else (hd i, (m, tl i,  o))

    E [[I]](m, i, o) = if m i = unbound  then  error, 
                                        else  (m I, (m, i, o))

    E [[E1 + E2]]s = if (E [[E1]]s = (v1,s1) & E [[E2]]s1 = (v2,s2))  
                                then (v1 + v2, s2)                                       else error

    E [[fn x => E]]s  = fun n in Nat. E [[E]](s[n/x])

    E [[E1 (E2)]]s = E [[E1]]s (E [[E2]]s)

Note difference in meaning of function here from that of operational semantics!

Define B: BoolExp -> [State -> [[Value x State] + {error}]] by:

    B [[true]]s = (true,s)
    
    B [[false]]s = (false,s)
    
    B [[not B]]s = if  B [[B]]s = (v,s') then  (not v, s'), 
                                          else  error
                        
    B [[E1 = E2]]s = if  (E [[E1]]s = (v1,s1)  & E [[E2]]s1 = (v2,s2))  
                               then  (v1 = v2, s2)
                               else error

Define C : Command -> [State -> [State + {error}]] by:

    C [[I := E]]s = if  E [[E]]s = (v,(m,i,o)) then (m[v/I],i,o)
                                                else  error

where m' = m[v/I] is identical to m except the value of I is v.

    C [[output E]]s = if  E [[E]]s = (v,(m,i,o)) then (m,i,v.o)
                                                  else  error

where v.o is the result of attaching v to the front of o.

    C [[if E then C1 else C2]]s = if  B [[B]]s = (v,s')  
                                            then  if  v  then  C [[C1]]s'
                                                         else  C [[C2]]s'
                                            else  error
                                            
    C [[while E do C]]s = if  B [[B]]s = (v,s') 
                                     then if v then if  C [[C]]s' = s''  
                                                    then C [[while E do C]]s''
                                                    else  error
                                               else  s'
                                     else  error
                                   
    C [[C1; C2]]s = if  C [[C1]]s = error then error
                                           else  C [[C2]] ( C [[C1]]s)

End Tiny

Notice that definition of while is a recursive definition.

Thus, if B [[B]] s = True and s' = C [[S]] s, then C[[while B do S]]s = C [[while B do S]]s'

Solution involves computing what is known as least fixed points.

Denotational semantics extremely popular during 70's and 80's, but has generally fallen out of favor because it is not as good for modelling concurrent systems. Also natural operational semantics is very similar, but does a better job with concurrency and is easy to convert to an interpreter.

Which is best?

No good answer, since have different uses.

Complementary definitions. Can be compared for consistency.

Programming language definitions usually still given in English. Formal definitions often too hard to understand or require too much sophistication. Gaining much more acceptance. Now relatively standard intro graduate course in CS curricula.

Some success at using formal semantics (either denotational or operational) to automate compiler construction (similar to use of formal grammars in automatic parser generation).

Semantic definitions have also proved to be useful in understanding new programming constructs.

Semantics and type theory now being used to prove security properties of code.

Back to:

CS 334 home page

Kim Bruce's home page

CS Department home page

kim@cs.williams.edu

CS 334 Programming Languages Spring 2002 Lecture 20