Woolite Compiler Implementation Project
Phase ??? : Building a Parser with YACC
Due: Apr. 14, 2006

Given the limited time available, I don't actually have you write a complete compiler in this course. In particular, Rather than have you take the time to build a complete syntactic analyzer for Woolite, I only ask you to construct a parser and scanner for a subset of the language. This will give you experience using a parser and scanner generation tools, but will take much less time than the construction of a complete Woolite syntax analyzer. The construction of such a mini-parser will be the goal of this assignment. Since it will never really be part of the Woolite compilers you complete at the end of the semester, I haven't given it a phase number. In the remaining "phases" of the project, you will continue to use the scanner and parser I provide.

The language your parser should accept is defined by the grammar found at the end of this handout. This grammar was derived from the actual grammar for Woolite by eliminating most forms of statements and expressions, support for inheritance, and support for arrays (and probably a few other things I have forgotten).

You should begin by creating a new directory for this phase of the project (While you will not use the parser you produce for this assignment later in the project, you will use the code you produced for phase 2. So, you should avoid modifying phase 2 in any way while working with YACC and Lex.) Copy the files Makefile, parser.y and scanner.l from the directory ~tom/shared/434/woolite/yaccing into the directory you have created for your parser. There are several other files in the yaccing directory, but you should only copy these three!

Remember that in order to allow make to properly account for the references your source file makes to '.h' files you must execute the command:

make depend
after first making these copies and whenever you add or change any #includes.

The parser.y file contains the beginnings of the parser you need to create. In particular it contains %token declarations for all of the terminal symbols used. It also contains includes for several .h files that describe the syntax trees to be produced and the interfaces of several routines I have provided that you may want to use.

Obviously, you need to add the specification of the grammar itself to this file. You will also need to add %type declarations for the non-terminal names you use. Remember that once you start adding semantic actions, YACC will be unable to produce a parser without lots of type errors until you finish the job. So, I strongly urge you to first enter a valid grammar for the language and get YACC to produce a working parser that does nothing other than decide whether an input is or is not valid. Then add your semantic actions.

The scanner.l file similarly contains the skeleton of a Lex specification for a scanner. You will need to add regular expressions and actions for the tokens associated with the subset of Woolite you need to parse. Like the parser file, this file includes several .h files you will need to complete the scanner. It also defines a yyerror function which is used by the generated scanner and parser to print error messages.

One of the .h files that you will be using in this project is named syntax.h. This file contains declarations of the types used when associating semantic values with non-terminals during the parse. The most important declarations from this file are shown below.

struct lexval {
  int line;
};
     
struct listhead {
  node *head, *tail;
};

typedef union {
  node *treenode;
  struct lexval lexeme;
  struct listhead list;
} YYSTYPE;

/* Tree building operations provided by syntree.c */

   /* makesingle  -  Make a node with a one child.    */
node *makesingle(nodetype op, node *child, int line);

   /* makepair  -  Make a node with a two children.    */
node *makepair(nodetype op, node *child1, node *child2, int line);

   /* maketriple  -  Make a node with a three children.    */
node *maketriple(nodetype op, node *child1, node *child2, node *child3, 
		 int line);


Declarations from ~tom/shared/434/woolite/syntax.h
 

The most important declaration is that of the union type YYSTYPE. As explained in class, this is the type that describes to YACC the values that will be associated with tokens and non-terminals. The union type this declaration associates with the name YYSTYPE includes three members. The treenode member of the union is used when a syntax tree node is to be associated with a non-terminal or terminal. The lexeme member of the union is used for most tokens returned by the scanner. It simply provides a way that the scanner can associate a line number with each token it returns to the parser. The scanner should associate Nident and Niconst nodes with identifier and constants when they are returned as tokens. A lookup function is provided that will keep a hash table of all identifier seen and build a unique Nident node for each identifier. Note: The scanner is also expected to return an Nident node for the keyword integer even though it is treated as a distinct token type.

The list member of the YYSTYPE union can be used when building any of the various lists found in our syntax trees. If you can't figure out what it is for, remember that I explained in class that right recursive rules are evil if a bottom up parser is being used.

Several functions are provided in the file syntree.o that make the actual construction of syntax tree nodes simple. The functions makesingle, makepair and maketriple can be used to create nodes with 1, 2 or 3 children respectively. (You can use makesingle to create nodes to represent the special literal null.) In addition to pointers to the nodes which are to become the children of the new node, each of these routines expects a node type argument (op) and a line number that should be associated with the new node (line). The other functions declared in syntax.h -- errornode and makelist -- will not be of use in your parser.

You will not need to create a main.c file for this phase. The Makefile is set up to include the file ~tom/shared/434/woolite/yaccing/main.o in the woolite executable it builds using your parser. The function main defined in this main.o file is basically like the main I gave you to start phase 1. It initializes the symbol table, calls yyparse and then uses printree to display the syntax tree built by the parser. The new main, however, has two handy features. If you give it a file name as a parameter it will read its input from that file instead of from its standard input. So, you can type either

woolite sample.j
or
woolite < sample.j
More importantly, the new version of main recognizes one "option". If you type
woolite -d ... 
the main program will set a flag that enables debugging features of the parser built by YACC before invoking yyparse. When this flag is set, your YACC-built parser will display somewhat readable messages telling you which terminals it has read and which reductions it has applied. You will probably find this very helpful, since the only other feedback YACC will give you if something goes wrong during a parse is the message syntax error. That's right, not even a line number!

Grammar for Woolite subset

< program > -> < class declaration >

< class declaration > -> class < identifier > {
< declaration list >
}
< declaration list > -> < declaration list > < declaration >
| < declaration >
< declaration > -> < class declaration >
| < method declaration >
| < variable declaration >

< variable declaration > -> < type specification > < identifier > ;
< type specification > -> < identifier >
| int

< method declaration >

-> < return type > < identifier > < formals part > {
< method body >
}
< return type > -> void
| < type specification >
< formals part > -> ( < formals list > )
| ( )
< formals list > -> < formals list >, < formal declaration>
| <formal declaration>
< formal declaration > -> < type specification > < identifier >

< method body > -> < variable declaration list > < statement list >
< variable declaration list > -> < variable declaration list > < variable declaration >
|

< statement list > -> < statement list > < statement >
| < statement >
< statement > -> < invocation > ;

< invocation > -> < identifier > < actuals >
| < expression > . < identifier > < actuals >
< actuals > -> ( < actual list > )
| ( )

< actual list > -> < actual list > , < expression >
| < expression >

< expression > -> null
| < integer literal >
| < variable >
| < invocation >

< variable > -> < identifier >


Computer Science 434
Department of Computer Science
Williams College