Woolite Compiler Implementation Project
Phase 2.3: Code Generation for Methods
Due: April. 28, 2002

To complete a version of your Woolite compiler capable of producing runnable 34000 code, you must generate the instructions to handle method invocations and generate the correct code before and after the bodies of methods. Your final output should be an assembly language program that accurately implements the Woolite program provided as input to your compiler. This assembly code should be written to standard output.

To make it easy to process Woolite programs with your compiler, I will provide a short shell script named wool (along with lots of other new odds and ends in the shared/434/woolite/phase2.3 sub-directory). This script assumes your executable is named woolite (as it will be unless you have changed the Makefile I provided). The wool script will expect the name of a Woolite source file as input. To make things look right, the source file's name should end with a .w suffix. The script will run the .w file through your compiler and then take what your compiler wrote to standard output and provide it as input to the 34000 assembler. To make it possible to use #include directives in the assembly code you output (I'll explain why you will need this ability later), the wool script will run your compiler's output through the C pre-processor, cpp, before sending it to the assembler.

The "final" output of this process will be a tmem file, which will be read as input by the wc34000 interpreter program. In addition, the script will leave the actual output of your compiler in a file whose name is obtained from the name of the input file by replacing the .w suffix with a .s suffix. Similarly, the output listing produced by the assembler will be stored in a file ending with a .l suffix (this file is actually more useful to read than the .s file because it shows in which word of memory each line of code is stored).

To enable you to keep your output code separate from error messages and diagnostics, I have written my code so that all output produced by printree and DumpDecldescs is directed to "stderr". In addition, in case you want to keep the output that goes to standard output and standard error together, my routines start each line of output they produce with a ";". This will cause the assembler to treat such lines as comments.

How it All Starts

Execution will begin with the first line of code in the assembler file you produce. When this line is executed, the stack pointer will be set but no register will be pre-loaded with the address of the global variable area. To make it possible for you to load this value, the assembler puts the address of the first unused word in memory (the word after the last instruction in your code) in word 1 of memory. Thus, the instruction

    MOVE    1,freePtr
is probably the first line of code your compiler should output on any input program.

After this instruction, you should generate code to create an object of the main class and to invoke the "main" method of this class. After the JSR to the main method, place a HALT instruction. Then, generate code for all the methods of all the classes defined in the program.

Associating Labels with Method Tables and Method Entry Points

To enable you to generate correct code for invocations, I have included fields designed to refer to code labels in the declaration descriptor formats used for methods and classes. These fields are named entrylbl and methodtab. They are intended to hold the "code labels" placed on the first line generated for a method or for the method jump table for a class.

The type of the entrylbl and methodtab fields are determined by the value of the #define CODELBL. If you do not include a #define for CODELBL, it defaults to the type char. However, if you have declared a special type to hold code labels, you can #define CODELBL to be the name of that type. For example, if your type's name was codelabel, you would simply include the define

#define CODELBL codelabel
in your .c files.

In addition, I have added "code label" as one of the members of the union type used to represent operand descriptors. This will make it easier to implement function in your low-level code generator to output instructions that access information using code labels. This means, however, that the opdesc.h file now also depends on the #define for CODELBL. As a result, this #define should probably be one of the first lines in your .c files.

While I have included the entrylbl and methodtab fields in the declaration descriptor types, it is up to your code to set these fields.

An Input/Output Library?

To make your compiler useful, you must provide a standard set of input/output methods. These methods should be named putNum, getNum, outChar, and getChar. They should provide a way to execute the corresponding 34000 instructions from a Woolite program. The methods putNum and putChar will expect one value parameter (of type int). The methods getNum and getChar will take no parameters and return int values. All four of these methods will be associated with a built-in class name IOlib. To use these methods, a programmer will either create a new object of the IOlib type or define one or more classes within a program that extend IOlib. Within any class that extends IOlib, the names putNum, getNum, outChar, and getChar can be used to perform simple input/output operations.

To make it easy for you to add support for this input/output "library" to your compiler, I have done two things: 1) I have included code in the init_symbtab routine which creates declaration descriptors for the four I/O methods and for the IOlib class and then creates a binding to place the name IOlib in scope; and 2) I have provided you with a file of 34000 assembly language code named iolib.h that contains the actual assembly language code for these procedures.

Since you have defined your own types to represent code labels, I could not include code in init_symtab to associate code labels with the entrylbl and methodtab fields found in the declaration descriptors for the IOlib class or the four I/O methods. You will have to include code in your compiler to initialize these fields. You can use the "lookup" function to access the declaration descriptor for the IOlib class.

The iolib.h file is not a C header file. It is a file of assembly language code to be included with the code you generate. Since wool runs the assembly code you produce through the C pre-processor, you can use this file by including the line

#include "iolib.h"

in the assembly language output your compiler produces. You will probably want to place this line right after the code for all the methods you generate.

Like the code I gave you in ~tom/shared/434/woolite/phase2/stmtgen.c you may find that you have to modify the code in my iolib.h file before you can use it. In particular, since your code for handling code labels may insist on sticking a few digits at the end of each label used in your assembly code, you may have to change the names on the first lines of the input/output method code to include such digits. As a result, I will expect all of you to submit copies of the actual version of this file you use. To make sure that this happens, you must include iolib.h in the HDR line of your Makefile.

Debugging Support?

I still have not worked out the details required to ensure your compiler will support source level debugging, but I did not want to delay the distribution of this handout any longer. As a result, that topic will be discussed in an addendum to this handout.


Computer Science 434
Department of Computer Science
Williams College