Woolite Compiler Implementation Project
Phase 1 : Semantic Processing
Phase 1.1: due 2/21
Phase 1.2: due2/28

As a first step toward constructing a complete Woolite compiler, I would like you to implement the semantic processing required for Woolite programs. As you know, semantic processing occurs after syntactic processing. To enable you to work on semantic processing before syntactic processing, I will provide you with object code for a syntactic analyzer for Woolite that produces syntax trees in the form described by the accompanying An Intermediate Form for Woolite Programs handout. The details of using this syntactic analyzer are discussed in yet another handout: Working with the Woolite Parser.

Required Processing

As we have discussed in class, during semantic processing:

declaration descriptors are created,
information about each class, variable, and method is stored in its declaration descriptor.
references to identifier descriptors that had been used to represent identifiers in the abstract syntax tree produced by the syntactic analysis phase are augmented with references to the appropriate declaration descriptors, and
Checking for context-sensitive errors is performed.

In an effort to help you budget your time, I have broken the tasks your semantic analyzer will need to perform in two sub-phases. While writing the code for the first sub-phase, you may assume that:

None of the classes defined in the programs you process will extend other classes,
The only invocations used in the program will be of the form

< invocation > -> < identifier > < actuals >

(That is, methods will not be invoked explicitly on objects or using super. This means you won't have to worry about the hash table for method name bindings during the first sub-phase)
No expressions using arithmetic, logical, or relational operators will occur in the source program.
No type casts or string constants will be used.
No if statements or loops will be included in the source program.

Obviously, it would be quite hard to write a useful program that satisfies these assumptions, but such program will still exercise quite a bit of code in your compiler.

For the second sub-phase you should add the code required to handle all of the features supported by Woolite.

Error checking

While most of our discussion in class has focused on the issue of correctly associating uses of identifiers with declarations of the identifiers, a good bit of your code will be devoted to verifying the semantic consistency of the program. Once you know how to correctly interpret all the identifier names used in a program, such checking is usually straightforward, but can require quite a bit of code.

Your code should print error messages to the standard error output file. Your error messages should be as informative as possible. Each error message should include the number of the line in the source file on which the error occurred. If appropriate, information such as the name of the identifier involved should be included. In addition to printing a message for each error, you should increment the global variable errorcount each time such an error is printed. A declaration for this variable is included in syntree.h. The value of this variable will be used later to decide whether or not to proceed with code generation.

To guide you in error checking, the following hopefully (but not necessarily) complete list of errors to consider is provided:

All names used in the program are declared in some scope surrounding the use.
Any name used in an extends clause is defined as a class name for a class that appears before the subclass definition.
Any name used in a type specification is defined as a class name.
Any name used to describe a value or an object in an expression is defined as an instance variable, local variable, or formal parameter (i.e. not a class or method name).
No name is defined more than once in a given scope. In particular, it is an error to define a name within a class if a definition of that name is inherited by the class unless the definition in question is a method definition that overrides the inherited method.
The type of any variable that is subscripted is an array type.
The type of any subscript expression is int.
The type of any expression used to describe an object on which a method is invoked is a class that includes a method definition with the name used in the invocation.
The type of each expression used as an operand of any of Woolite's logical, arithmetic, or relational operators other than == is integer.
The type of any expression used as the right hand side of an assignment statement is either identical to or a subtype of the type of the variable that appears on the left hand side.
The type of any expression used as the condition in an if or while statement is int.
A return statement contains an expression if and only if it appears within the statement list of a method that was not defined to return void.
The type of any expression used in a return statement is either identical to or a subtype of the method's return type.
The number of actual parameters passed to any method equals the number of formals included in the method's definition.
The type of each actual parameter expression used in an invocation is either identical to or a subtype of the type of the corresponding formal.
The name provided as a class name in a type cast is actually declared as a class name.
The type of an expression used as in a type cast is either identical to or a superclass of the class name provided in the type cast.

When errors are detected, be sure to take actions appropriate to let you later avoid printing spurious error message or, worse yet, crashing. For example, if the name used to specify a variable's type within its declaration is undefined, set the type field in the new variable's descriptor to a known value (NULL?) rather than leaving it uninitialized. Then, in context's where you would access the variable's type to check the correctness of some other construct, treat the "known value" used as a special case to avoid generating a redundant error message.

Sequencing the Traversal of the Syntax Tree

The fact that Woolite supports arbitrarily deep nesting of class definitions and allows forward references to declarations in all contexts except extends clauses makes it necessary to make several partial sub-passes over the syntax tree and to think very carefully about what processing to perform during each pass. In fact, you will have to make two passes over the whole program.

During the first pass, your compiler will ignore all method bodies. The goals of this pass are to create a partial declaration descriptor for each class, each instance variable and each method, and to enter bindings for all the class-method pairs in the hash table used to interpret qualified method invocations.

After this first pass is complete, the declaration descriptors you have created for classes should include pointers to any superclasses and contain pointers to lists of all of the instance variable declarations, method declarations and nested class declarations found within the class. The declaration descriptors you create for the methods in the source program will be complete except that they will not contain information about any local variables declared in the method bodies. In particular, the declaration descriptors you create for a method during the first pass must include a head pointer to a list of declaration descriptors for the method's formal parameters. You won't want to place bindings for these parameter names into the scope lists and stacks at this point, but you need the lists of formals so that you can verify correct formal-to-actual parameter type correspondence during the second pass.

During this first pass, you will have to place bindings for class names and instance varliables in the lists describing each scope and on the stacks of bindings associated with identifier descriptors so that when you examine the parameter and return types found in method headers you can associate the names used with the correct declarations. Since you will not process method bodies, however, you do not need to place method, parameter, or local variable names "in scope" during this pass.

During the second pass, you will examine all the method bodies you ignored on the first pass. Since the hash table used to associate class-method pairs with the correct declaration descriptors will have been completed during the first pass, you will not need to add bindings for methods to that hash table again on the second pass. You will, however, have to add bindings for all names to the scope lists and stacks (again) during this second pass.

Take advantage of the work you did on the first pass to avoid actually traversing sections of the syntax tree on the second pass when possible. For example, during the first pass, you will have traversed the subtree that describes a method's formal parameters and created a linked list of their declaration descriptors. During the second pass, you need to put bindings to these descriptors on the scope lists and stacks. You could traverse the tree again to find pointers to the descriptors, but there will be a pointer to a linked list of the descriptors in the descriptor you created for the method itself. Use this linked list, rather than the syntax tree, to write a loop that adds the needed bindings to the descriptors.

Computer Science 434
Department of Computer Science
Williams College