Working with the Woolite Parser

Working with the Woolite Parser

To enable you to start your compilers "in the middle", I have constructed a syntactic analyzer for Woolite that you can use as a front-end as you implement the semantic processing and code generation phases of the compiler.

The directory ~tom/shared/434/woolite contains the code which you will need to use my syntactic analyzer. Within this directory, you will find sub-directories for each of the major assigned phases of the project (phase1, phase2, and phase3). Each of these directories will contain versions of my parser specialized to the corresponding phase of the project. Within each of the phase subdirectories, I will store files named syntree.h, syntax.h and symtab.h which contains type definitions describing the structure of the syntax trees and symbol table entries produced by the Woolite syntax analyzer I have provided. Such files are called "header" files (hence the use of the ".h" suffix). These directories will also contain object files (".o" files) containing the executable code for the parser, scanner, and symbol table routines.

Within the directory ~tom/shared/434/woolite you will also find a sub-directory named startup. The most important file in the startup subdirectory is named Makefile. There is also a short source file named main.c. You should copy both of these files from ~tom/shared/434/woolite/startup into the directory in which you plan to create the files needed to complete the phase. While you may have the urge to copy other files from ~tom/shared/434/woolite, the files Makefile and main.c are the only files you should copy.

The Makefile is intended for use as input to the Unix "make" utility. In case you are unfamiliar with "make" it is a utility which takes a file describing how to build an object program from one or more source files and performs needed compilation steps to build the object program when invoked. For example, after you have copied Makefile and main.c into one of your directories, make that directory your current working directory and then type

make

The system will perform the necessary steps to create an executable parser for Woolite from the main.c file and the object files provided in ~tom/shared/434/woolite.

The directory ~tom/shared/434/woolite/samples contains some sample Woolite programs. So, you can test the parser produced by typing

woolite ~tom/shared/434/woolite/samples/allsyntax.c

The result should be a printed display of the syntax tree built by the parser.

In the remainder of this handout, I will attempt to tell you all you need to know to work with Makefile, at least for this phase of the project. In addition, those of you unfamiliar with make may wish to read the document Pmake -- A Tutorial .

Makefile has been designed to enable you to easily combine your code with my code through all the phases of the project. The file starts with about 20 lines describing variables you can (and must) set to customize the file to your compiler and parameters (targets) you can specify when invoking make to alter the way in which it interprets the contents of Makefile.

The most important variables at this point are HDR and SRC. You will find their definitions shortly after the comment lines. The definitions look like:

HDR =
SRC = main.c

The value of SRC should be a list of the names of the source files for the compiler kept in your directory. Initially, the main.c file you copied from ~tom/shared/434/woolite will be the only such file. When you create other source files, however, you should add their names. For example, if you create a file named resolve.c to hold the code for this phase, you should edit Makefile changing the definition of SRC to

SRC = main.c resolve.c

The variable HDR should be set equal to a list of the names of all the header files kept within your directory. The header files I have provided for you should not be included here. These files will be accessed from my directory because of the setting of the HDRs variable.

Two other variable definitions you will need to change eventually are those for PHASE and SUBPHASE. The PHASE variable determines which subdirectory of the pub directory will be used to access my header files and object files. You will need to change it as you move on to later phases. The main function of the SUBPHASE variable is to enable my scripts to place your finished product in the right container when you submit it for grading. If you forget to change it before you submit subphase 2 or 3 of a particular phase of the project you may overwrite your earlier submission.

When you simply want make to compile a new version of your compiler, you will invoke it as shown above by simply typing its name. The make program can also be used to perform several other useful functions by invoking it with one of the parameters described below.

Make's operation depends on having, within Makefile, a collection of lines specifying how each object file needed to create an executable version of your compiler depends on the source and header files you and I construct. Whenever you create a new source file or add a #include to a file, this information changes. Typing

make depend

will cause make to read through all of your source files and edit Makefile to update the collection of dependency specifications as appropriate. Do not forget to run this command after making such changes to your source files.

To simplify the task of keeping current listings of your code, I have included two definitions for targets named list and listall in Makefile. It you type

make listall

listings of all of your source and header files (and of your Makefile) will be sent to the laserwriter. If you type

make list

listings of only those files that have changed since you last typed either make list or make listing will be produced.

To provide a means to quickly and reliably distribute information (like corrections to errors in handouts) to you, I will maintain a file name PROJECTNEWS in the ~tom/shared/434/woolite directory. When you execute the simple command make, the system will check to see if this file has changed since you last read its contents. If it has, it will inform you that you should read it. To read the file, type

make meread

This will show you the contents of the file and update the information make uses to tell when you last read the file.

At several points during the semester, I will ask that you submit your source files electronically. To do this, simply type

make submit

Some Useful Extras

In addition to providing a parser and a scanner, I have included three procedures in the code I have placed in ~tom/shared/434/woolite that you may find very helpful when writing code to process the syntax tree and the symbol table.

The first is the printree procedure which you may have already noticed is called from within the main program I have provided. printree expects two parameters. The first should be a pointer to a node in the syntax tree. The routine will output a somewhat readable display of the subtree to which its first parameter points. The second parameter is an integer that specifies how much the output produced by the procedure should be indented. You will generally specify 1 or 0 for this parameter. The printree routine, however, uses other values when it calls itself recursively to ensure that subtrees are indented in the output.

printree may behave quite oddly if asked to print trees that are in some way damaged. For example, if you set a child pointer to a value that is not really a pointer you are likely to see your program crash in printree. I have tried to make printree fairly tolerant of NULL pointers in unexpected places. So, you may protect yourself somewhat by setting pointer fields in syntax tree nodes and symbol table nodes to NULL rather than leaving them uninitialized when no other value is appropriate.

The second routine provided is named visitlist. Within the syntax tree there are several list structures (sub-trees built using Nstmtlist nodes for example). When processing these lists you will usually want to apply certain operations to all sub-trees of the list. visitlist provides a way to do this without writing the same (fairly simple but boring) loop over and over again.

The function declaration for visitlist is shown below:

/* Visitlist   -  Apply the function 'action' to each of the nodes */
/*                in some subtree that forms a list.  Skipping error nodes */

int visitlist(node *listhead, int (*action)(), int param)

The listhead parameter should be a pointer to a tree node that is a list header (i.e. of type "N...list"). The action parameter should be the name of a function that performs the action you want to have applied to each element of the list. For example, if you write a routine:

processStmt( node *stmt )
{
...
}

which processes one statement and the variable "elsePart" held a pointer to the list of statements found in the else part of an if statement, a call of the form:

visitlist( elsePart, processStmt, ...)

would invoke processcomponent on each statement subtree in the list elsePart.

I have slightly simplified the preceding discussion by not explaining the third parameter to visitlist, param. Sometimes, the routine you want applied to every element of a list requires some other parameter each time it is called. For example, when processing statements, the processing routine needs access to the declaration descriptor of the containing method so that it can process return statement correctly.. The param argument provides a way to pass such information to the routine invoked by visitlist.

When visitlist invokes the routine passed as action, it actually passes two parameters: a pointer to a sub-tree which is an element of the list being processed and whatever value was passed to it as param.

Thus, in the preceding example, if we wanted to provide a pointer to the declaration descriptor for the containing class to processStmt the declaration of processStmt could be changed to:

processcomponent( node *formaldecl, decldesc *containingMethod )
{
...
}

and, assuming that curMethod is a pointer to the structure type whose components are being processed, the complete call to visitlist would look like:

visitlist( components, processcomponent, curMethod)

In cases where there is no need to pass such an extra parameter to the routine invoked by visitlist, you can just pass the value "0" and declare a dummy parameter for your action routine.

As an extra little feature, I should mention that visitlist is smart enough to skip Nerror nodes for you.

Finally, To let me (and you) know that you have succeeded in building complete declaration descriptors, your code for phase 1 should produce a readable "dump" of the symbol table declaration descriptors it creates. To make this easy, I have encorporated a routine named DumpDecldescs in the .o files provided to you. If you call this routine with a pointer to your syntax tree after resolving all identifiers, it will print out the contents of each declaration descriptor. Like printree this routine will only work correctly if the fields of the syntax tree and symbol table entries are valid.

Computer Science 434
Department of Computer Science
Williams College