Woolite Compiler Implementation Project
Phases 2.1 & 2.2: Code Generation for Expressions and Statements
Code for Arithmetic Expressions & Assignments due: March 14, 2006
Code for Control Structures due: March 23, 2006

Now it is time to actually start generating code for Woolite programs. For this phase, you should complete the generation of code for most expressions, assignment statements and control structures (i.e. if and while statements).

I have broken this assignment into two sub-phases. For the first, I want you to concentrate on generating code for assignment statements and simple arithmetic expressions. By the phrase "simple arithmetic expression" I mean those expressions involving only the arithmetic operators +, -, *, and %. In particular, during this first sub-phase you should not attempt to generate code for relational operators or for the logical operators.

In the second sub-phase, you should extend your routines so that they also generate code for relational operators, logical operators, if statements and while loops. Even after completing phase 2.2, you will not have a complete code generator. In particular, you will not yet be able to generate code for invocations, constructions, or type casts. These constructs will be handled in phase 2.3.

While working on both parts of this code-generation phase, you should try to structure your code in such a way that you maintain a separation between "high-level" and "low-level" code generation. The high-level routines are the ones that traverse the syntax tree and handle the code-generation issues that are directly related to handling Woolite programs. The low-level routines should perform functions like actually outputting instructions, allocating registers to hold temporaries and selecting the appropriate effective address to generate to reference a particular operand.

The instructions output by your low level code generator should be in the format accepted by the 34000 assembler. That is, you should be generating program text rather than a binary file. The instruction set of the MC34000 and its assembly language are described in the handouts "The MC34000 Computer" and "The MC34000 Assembler".

Even though your output will be text, it will not be very readable. Variables will be referenced using numerical displacements rather than symbolic names. One (required) way to make things easier to follow is to include lines in your code that indicate which lines of output correspond to which lines in the original input program. The best way to do this is to output assembler SOURCE directives. This will not only aid you in reading your code in these phases. It will allow you to use the 34000 debugger when you actually generate complete programs in phase 2.3.

While you will not be optimizing your code in any sense, you should generate good naive code. In particular, you should avoid early loads of display pointers and perform arithmetic operations on variables in memory rather than first loading them into registers when possible. Also, when handling control structures you should use the techniques discussed in class to avoid generating branches to unconditional branches.

The code you generate for variable references will need to know how to access the chain of static link pointers to objects on the heap. You will not need to know how to maintain the correct values in the static link pointers at this point. It will not be until phase 2.3 that you actually produce the code that puts the correct values in the static links. The only thing you need to decide about the static links at this point is where they will be.

I do request that you use my type definitions for the operand descriptors manipulated by your code for this phase. The definitions for this type can be found in the file opdesc.h in the phase2 subdirectory of my CS 434 "shared" directory. If you update the PHASE variable in your Makefile to "2" you will be able to simply "#include opdesc.h".

To help you get a good start, I will give you some additional C code which you are free to use as is, modify or ignore as you see fit. This code can be found in the two files codegen.h and stmtgen.c in the directory ~tom/shared/434/woolite/phase2. If you want to use either of these files, you should copy them to your own directories.

The codegen.h file contains descriptions of some of the main data structures I used in my own solution to this exercise. In particular, it includes definitions for the tables used to keep track of data temporaries (registers and memory) and address registers. The file itself contains comments that (completely?) describe these data structures.

codegen.h also contains headers for the most important routines provided by my low-level code generator. Note: I am not giving you the code for these routines. You have to write your own low-level code generator. In particular, don't be fooled by comments that say a particular variable or array will contain some value. Such comments will only be true if you write code that makes them true! Again, these files are offered to help you get a good start on your own design.

The other file I am providing is stmtgen.c. For the first part of this assignment, you do not need to generate code for statements, but you will still need routines that traverse the statement portions of the syntax tree to find all the expressions contained in statements so that you can generate code for them. You would throw out much of this code once you were working on the second part of the assignment. So, what I have given you is my version of the code to traverse the syntax tree calling "genExpr" whenever it finds an expression. As I indicated above, you are free to use, modify or ignore them depending on how much the "hints" the file contains influences your own final design.


Computer Science 434
Department of Computer Science
Williams College