An Intermediate Form for Woolite Programs

This document describes the intermediate representation that your compilers will use for Woolite programs. The scheme we will use is based on the idea of syntax trees. Therefore, much of this document will be concerned with the structure of trees for Woolite programs. Syntax trees typically include some pointers into the compiler's symbol table. Accordingly, this document also includes a specification of a symbol table organization for a Woolite compiler. In fact, since the syntax trees depend upon the symbol table, we will begin with a summary of the organization of the symbol table.

Symbol Table Organization Overview

The symbol table maintained by your compiler will consist of two main components. The first is a collection of dynamically allocated structures containing one element for each distinct identifier used in the program. We will refer to this collection of structures as the identifier table and to its elements as identifier descriptors. Each of these elements will contain a pointer to the character string representation of the identifier with which it is associated and several other link fields. This table will be created by the scanner.

The second component is a collection of dynamically allocated structures including one element for each distinct declaration or definition in the program. We will refer to this collection of structures as the declaration table and its elements as declaration descriptors. These entries will be created and initialized by the semantic processing phase of your compiler. Each of these elements will include a pointer to the identifier descriptor for the identifier with which it is associated; attribute fields describing important characteristics of this declaration of the identifier (such as whether it is a method, a class, or a variable); and several additional link fields.

In the syntax trees produced as output of the syntactic analysis phase, identifiers will be represented by pointers to their identifier descriptors. During the semantic processing phase, references to identifiers within the syntax tree will be modified so that references to identifier descriptors are augmented with references to the appropriate declaration descriptors.

The process of determining which declaration should be associated with each identifier at different points in the program text will require the use of a third collection of dynamically allocated structures used to represent bindings between identifiers and declarations. In fact, your semantic processing phase should create two sets of binding descriptors.

The first collection of binding descriptors will be used for those identifiers that are associated with declarations based on static scope rules. Your code will include instructions to ensure that if an identifier has a meaning in the scope your compiler is currently procssing, then its identifier descriptor will points to a binding descriptor that in turn points to the declaration descriptor associated with that identifier in the scope. As a result, when processing a class or method definition, you will create both declarations and binding descriptors as you process the declarations in the class or method body.

In order to ensure that bindings that are temporarily hidden while you process a scope are restored when processing of the scope is completed, your code will maintain two types of collections of binding descriptors. For every identifier, you will maintain a stack of all of the active, statically scoped binding descriptors for that identifier. For each scope, you will maintain a list of all of the binding descriptors created for that scope. When you leave a scope, your code will pop the binding stacks of all of the identifiers that were bound in the scope to restore the bindings that were in force in the outer scope.

You will also create binding descriptors for identifiers used as method names. The associations of names with methods do not follow simple nested scope rules. When the code in a Woolite program invokes a method on an object, your compiler must be able to access the declaration descriptor for the method within the class of the object on which it is invoked. That class may not even be accessible within the current scope.

Your code must maintain a hash table used to locate the declaration descriptors for method names that appear in such invocations. Each "bucket" in this hash table will be a list of method name binding descriptors. Each method name binding descriptor will include pointers to the declaration descriptor for the method and a class in which the method was defined or inherited. Given a method name and a pointer to the declaration descriptor for a class, this table will enable one to locate the declaration descriptor for the method if the method is indeed associated with the specified class. This search structure will be created and maintained by the semantic processing phase of your compiler.

The format of the structures used to hold identifier descriptors, declaration descriptors, and binding descriptors is discussed below, after the specification of the syntax trees for Woolite.

Syntax Tree Organization

As discussed in class, there is a significant difference between the internal nodes of a syntax tree and most of its leaves. Within an internal node, one finds a phrase type and pointers to sub-trees. The leaves, on the other hand, usually hold information about identifiers and constants. In fact, in class I have suggested that rather than actually having separate nodes for the leaves, one could use symbol table entries for leaf nodes.

We will not actually do this in the compilers you build. The reason is a simple, practical one. To generate good error messages, one needs to keep information about where in the source program the text that corresponds to each sub-tree of the syntax tree can be found. We will do this by storing in each node the line number on which the first token that belongs to the phrase the node represents was found. This cannot be done for identifiers if all occurrences of an identifier are represented by a single symbol table entry. Instead, we represent identifiers by nodes that contain the line numbers on which they were found and pointers to the appropriate symbol table entries. Similar nodes will be used for constants.

As part of semantic processing (or as the first step in code generation), you will rewrite the trees that the parser produces for variable references. Basically, while the parser creates trees based on the syntactic structure of the source code, the code generator would prefer trees corresponding closely to the capabilities of the underlying hardware. Variable references, particularly subscripted variables, can be reconstructed by the semantic processing routines so that they explicitly describe much of the addressing arithmetic required by the variable references they represent.

Two special node types are used to support this translation of variable reference subtrees. The first is an internal node type used to represent the root of a variable reference subtree. These nodes will each hold a single pointer to the subtree that describes the variable reference. The other is a node type used to represent references to the frame of the currently executing method. Such method frame nodes do not appear in the trees produced by the parser but are needed to translate variable reference subtrees into a form that explicitly describes the required address arithmetic. These nodes will always appear as leaves in the tree.

Representing Syntax Tree Nodes

This leads to a syntax tree with five distinct node types. As a result, to specify the general "type" of a syntax tree node, we use the C union type node described below.1

      /* Union type that combines the 6 structure types used to describe */
      /* tree nodes */
typedef union nodeunion {
  struct unknode unk;
  struct internalnode internal;
  struct identnode ident;
  struct iconstnode iconstant;
  struct sconstnode sconstant;
  struct refvarnode var;
} node;
Each of the six node types present in the tree include three common fields: a field specifying the node's phrase type2 ( type ), a field specifying the line of the source code on which the first token of the phrase represented by the node's subtree occurred ( line ), and an operand field that may(?) be used in later phases of the complier. The type unknode, whose definition is shown below,

          /* The type 'unknode' provides a template that can be used to */
          /* access the common components found in all node types when  */
          /* the actual type of the node is unknown.                    */
struct unknode {
  nodetype type;
  int line;
  union opdesc * operand;
};

allows one to reference these fields in situations where the actual type of the node is not yet known. For example, if `root' is a pointer to a node of unknown type one can use the expression:

    root->unk.type
to determine its phrase type. One could also use the expression `root->internal.type' or `root->ident.type', but these expressions mis-leadingly suggest that the type of the node is already known. The type unknode is provided to support clear coding.

The structure type internalnode describes the nodes used to represent almost all of the internal nodes of the tree and several types of leaf nodes.

struct internalnode {
  nodetype type;
  int line;
  union opdesc * operand;
  union nodeunion *child[MAXKIDS]; /* pointers to the node's sub-trees */
};

In addition to the common type and line components found in all nodes, a node of type internalnode includes a component child which is an array of pointers to its children. The number of children of a given node can be determined from its node type. The syntactic analysis routines provided conserve memory by only allocating space for the child pointers actually used by a given internal node. Thus, if this document indicates that a node should only have 2 children, its third child pointer should not be used for any purpose.

There are three types of `internal nodes" that are really leaves of the tree. Nodes of type Null are used to represent occurrences of the expression null in the source program, nodes of type Nthis are used to represent the expression this, and nodes of type NFramePtr are used to represent references to the frame of the executing method. Nodes of these three types have no children.

The structure types identnode, iconstnode, and sconstnode are used to represent the leaves of the syntax trees produced by the parser. Declarations of the structure types are shown below:

         /* Nodes of type 'ident' are used for leaf nodes corresponding */
         /* to identifiers in the source code.  The value in the        */
         /* 'type' component of such a node will always be 'Nident'.    */
struct identnode {
  nodetype type;
  int line;
  union opdesc * operand;
  identdesc *ident;   /* Pointer to associated identifier descriptor */
  decldesc *decl;     /* Pointer to associated declaration descriptor */
};

         /* Nodes of type 'iconst' are used for leaf nodes corresponding */
         /* to character and integer constants in the source code.       */
         /* The value in the 'type'  component of such a node will       */
         /* always be 'Niconst'.           */
struct iconstnode {
  nodetype type;
  int line;
  union opdesc * operand;
  int value;               /* Integer value of the constant  */
  int ischar;              /* True if this was a character constant */
};


         /* Nodes of type 'sconst' are used for leaf nodes corresponding */
         /* to string constants in the source code.                      */
         /* The value in the 'type'  component of such a node will       */
         /* always be 'Nsconst'.           */
struct sconstnode {
  nodetype type;
  int line;
  union opdesc * operand;
  char * value;            /* value of the constant  */
};

Identifiers are represented by nodes of type identnode. The type component of such nodes will always be Nident. The ident and decl components of an identnode are pointers to the appropriate identifier descriptor and declaration descriptor for the identifier being referenced. The decl components of identnode nodes are set to NULL (the value 0) by the syntactic analyzer. During semantic analysis, the correct values should be stored in these fields.

There is one special group of identnodes produced by the syntactic analyzer. These are identnodes for the keyword int. Technically, int is a keyword rather than an identifier in Woolite. Treating it as an identifier that has been declared as a class, however, simplifies various parts of the compiler. Accordingly, occurrences of int will be represented by special identnodes in the syntax tree. In addition, I will provide a function called init_symtab that will create a declaration descriptor for int and linked this declaration descriptor to the identifier descriptor for int through a binding descriptor.

Each iconstnode contains two fields beyond the common type and line fields. One is named value. It holds the integer value of the constant. The second is a field named ischar which is used as a boolean flag indicating whether the constant found in the source code was a character or an integer. The type component of all such nodes will be Niconst.

Each sconstnode contains one fields beyond the common type and line fields. It is named value. It holds a pointer to the character string that is the value of the constant. The type component of all such nodes will be Nsconst.

There is one additional node type related to subtrees representing references to variables. Its declarations is shown in figures *. Tree nodes of type refvarnode are used to designate places where a value should be loaded from a memory address.

        /* Refvar nodes are included by the parser as the roots of all     */
        /* variable reference subtrees.  When created by the parser, the   */
        /* "baseaddr" field will either point to an Nsubs or              */
        /* Nident node.  During semantic analysis the "baseaddr" subtree   */
        /* will be converted into a subtree describing the calculation of  */
        /* the memory address for the variable.  A "displacement" field is */
        /* included to hold a constant offset from the base address to the */
        /* variable.   */
struct refvarnode {
  nodetype type;
  int line;
  union nodeunion
      * baseaddr;       /* Subtree describing base address calculation     */
  int displacement;     /* Displacement to variable relative to base addr  */
  decldesc *vardesc;    /* Declaration descriptor for referenced variable  */
};
The refvarnode type
 

The baseaddr field of a refvarnode points to a subtree that describes the location in memory being referenced. The value of the displacement field gives a constant value to be added to the address of this location before accessing memory. This field is initialized to 0 in trees created by the parser. The vardesc field is intended to point to a declaration descriptor for the variable referenced. You may ignore this field for now. It may serve a purpose during an optimization phase later in the project.

Node Phrase Types

As mentioned above, the phrase types Nident, Niconst, Nsconst, Nthis, NFramePtr, and Null are used to label nodes representing the leaves of the syntax tree. The phrase types Nrefvar and NFramePtr are used to identify the two special node types used to encode variable reference subtrees. All of the other node phrase names defined in the enumeration type nodetype are used to label internal nodes.

There are several important subgroups of node phrase types. One important group is the group of "list" phrases including Nstmtlist, Ndecllist, and Nactuallist. These nodes are used to represent lists of items in the program. In all cases, such nodes take 2 children. The left child ( child[0] ) of a list node points to the first element of the list (i.e. a statement, expression, variable definition or whatever element type is appropriate). The right child ( child[1] ) points to the remainder of the list. Its value is either NULL ( = 0 ) or a pointer to another list node of the same type.

Other important groups of phrase types include the statement phrase types (Nasgn, Nvocation, Nretn, Nif and Nwhile); the variable phrase types ( Nident and Nsubs); and the expression phrase types (which includes Null, Nthis, and NFramePtr in addition to all the "unaries" and "binaries" mentioned in the table below).

All of the phrase names used in internal tree nodes are described in the list below.

Node Num. of
Type Children Description
Nclass 3 Represents a class declaration. Child[0] is the Nident node for the class name. Child[1] is either NULL or the Nident node for the class which this definition extends. Child[2] is an Ndecllist node which is the start of the list of member declarations for the class. This list may contain other class declarations, method declarations, and variable declarations. The root of the full program tree will be an Nclass node.

Ndecllist

2 List header used to build lists of the three declaration node types: Nvardecl, Nclass, and Nmethod. All three types of declarations may occur in an Ndecllist associated with an Nclass node. Only Nvardecl nodes will be found in the Ndecllist nodes associated with Nmethod nodes (i.e. the parameter lists and local variable lists).

Nvardecl

2 Used to represent instance variable declarations and formal parameter declarations. Child[0] is the identifier being declared. Child[1] is either a pointer to an Nident node for the type of the name being declared or a pointer to an Narraydim node if the name is declared to refer to an array. Remember that a special symbol table entry is created during initialization to allow uniform treatment of the type name int.

Narraydim

2 Used to represent information about the number of dimensions and the size of the first dimension of array types. These nodes appear both in type specifications found in method and variable declarations and in array constructions. Child[0] will either point to another Narraydim node or to an Nident node for a class name or int. Child[1] may refer to an expression describing the array's size. This field will only be used in Narraydim nodes that a) are a part of Narraydim subtrees found in construction expressions and b) have an Nident node as Child[0].
Nmethod 4 Used to represent the definition of a method. Child[0] is an Nident node for the method's name. Child[1] is NULL if the method is declared void. Otherwise, Child[1] will point to an Nident node or Narraydim node describing the method's return type. Child[2] is a ( possibly NULL ) list of Ndecllist nodes containing Nvardecl nodes for the function's formal parameters. Child[3] is an Nbody node for the function's body.
Nbody 2 Used to represent the body of a method. Child[0] is a (possible NULL) list of Ndecllist nodes that refer to Nvardecl nodes for the method's local variables. Child[1] is a list of Nstmtlist nodes.
Nstmtlist 2 Used to represent statement lists. Child[0] will be one of the following six "statement" phase types or another Nstmtlist node.

Nvocation

3 Represents a method invocation (as either a statement or expression). Child[0] will be an Nident node for the method's name. Child[1] will be NULL if the method being invoked using static scope rules to access its name, otherwise it will refer to a subtree representing an expression that describes the object to which the method should be applied. Child[2] points to a (possibly NULL) list of Nactuallist nodes.

Nupvocation

3 Used to represent an invocation of a superclass method through the keyword super. The children of this node are identical to that of an Nvocation node (except that Child[1] will always be NULL).

Nasgn

2 Represents an assignment statement. Child[0] will be a node of type Nrefvar pointing to a subtree that describes the target of the assignment. Child[1] will be a node whose type is classified as an expression.

Nretn

1 Represents a return statement. If an expression was included in the statement, child[0] points to a sub-tree representing the expression. Otherwise, child[0] is NULL.

Nif

3 Represents an if statement. Child[0] points to a sub-tree representing the "boolean" expression. Child[1] points to a list of Nstmtlist nodes that represents the then part. If an else part was included, child[2] points to the list of Nstmtlist nodes representing the else part. Otherwise, child[2] is NULL. Note that the last two children will be list nodes even if only a single statement is included for either the then or else part.

Nwhile

2 Represents a while statement. Child[0] points to a tree representing the loop termination condition. Child[1] points to a statement list representing the loop body. For loops are rewritten using Nwhile subtrees by the parser.
Nactuallist 2 Used to represent lists of actual parameters. Child[0] will be a node of one of the expression phrase types.

Nrefvar

1 Nrefvar nodes are stored using the refvarnode type rather than the internalnode type. They do, however, appear as internal nodes in the tree. They appear as the roots of variable subtrees pointed to by Nasgn nodes and in expression subtrees.

In the trees produced by the parser, the baseaddr of such a node will point to either an Nident or Nsubs node. After semantic processing, an Nrefvar node will point to a node of some expression phrase type.

Nsubs

2 Represents a variable (or expression) formed by subscripting an array. Child[0] represents an expression that desribes the array. Child[1] points to an expression sub-tree for the subscript expression.

Nnew

1 Represents an object construction. Child[0] will point to an Nident node for the name of the class describing the object to be created or to an Narraydim node if an array is being constructed.

Ntypecast

2 Represents a type cast expression. Child[0] will point to an Nident node for the name of the target class. Child[1] points to an expression sub-tree describing the object to be cast.

Null

0 Represent an occurrence of the expression null.

Nthis

0 Represent an occurrence of the expression this.

NFramePtr

0 Represent a reference to the frame of the current method.

unaries

1 The node labels Nnot, Nneg and Nlength are used to represent expressions formed using the logical not operator (!), the arithmetic negation operator (unary -), and the array length operator. Child[0] points to a sub-tree representing the expression to whose value the operator should be applied.

binaries

2 The node labels Nor, Nand, Nlt, Ngt, Neq, Nle, Nge, Nne, Nplus, Nminus, Ntimes, and Ndiv are used to represent expressions formed using the logical, relational and arithmetic binary operators. The sub-expressions to whose values the operator should be applied are pointed to by child[0] and child[1].
Nerror 0 Inserted in tree at points where an error was detected in the syntax of a phrase. Actually, the only place that such nodes ever appear is as elements of "lists". So, the only place you need to check for them is when processing statement lists, parameter lists, etc.

Symbol Table Details

Now, to complete the discussion of our scheme for representing Woolite programs, we must discuss more details of the types used in the symbol table. As explained in the overview presented above, the symbol table is composed of identifier descriptors, binding descriptors, and declaration descriptors.

Identifier Descriptors

Identifier descriptors are actually quite simple. There is one such descriptor for each distinct identifier used in the program.3 A C language structure specification for the type identdesc used to store identifier descriptors is shown in figure *.

 
typedef struct iddesc {
  char *name;                       /* The characters string form of the identifer */
  struct iddesc *hashlink;          /* Link for hash chains used by scanner        */
  struct scopedBinding *bindStack;  /* Head pointer for stack of bindings of       */
                                    /*   this identifier in currently open scopes. */
} identdesc;
Declaration for Type `identdesc'
 

The name field is just a pointer to the characters that form the identifier. The hashlink field is used to maintain lists of identifier with the same hash value when building the hash table used by the scanner. It will not be of concern to you when doing semantic processing. The bindStack component is to be used as a pointer to the top of the stack of bindings to declarations of the identifier found in scopes that are still open. The scanner initializes this field to NULL.

Scopes and Scope Bindings

The relationship between an identifier and a declaration for that identifier is represented by a scopedBinding or methodBinding structure. scopedBindings are used for bindings associated with nested scope rules. methodBindings are used to represent the associations between a method name and its class and the declaration of the method.

The specification for the type scopedBinding is shown in Figure *.

/* Structure used to keep track of the bindings between identifiers and */
/* individual declarations that are active within a given scope and the */
/* bindings (if any) that they are masking according to nested scope    */
/* rules.                                                               */
typedef struct scopedBinding {
  decldesc * descr;                      /* The bound declaration                      */
  int level;                             /* The nesting level of this binding          */
  struct scopedBinding * bindStackNext;  /* Outer binding hid by this binding or null  */
  struct scopedBinding * scopeNext;      /* Link for list of all this scope's bindings */
} scopedBinding;
Declaration for Type `scopedBinding'
 

The descr field refers to the declaration associated with the identifier through this binding. The level field records the level of the scope in which this association was made. Note that this level can be different from the level in which the declaration itself occurred in the case of a method inherited from a class declared at a different nesting level. The bindStackNext field is used to maintain a stack of all active declarations of the identifier associated with this binding. The scopeNext field is used to maintain a list of all the bindings made in a given scope. Your code is responsible for creating scopedBinding structures and maintaining these stacks and lists.

Structures of type scopedesc should be used to keep track of all of the currently open scopes and the bindings made within them. The declaration for this type is shown below.

   /* Structure used to keep track of open scopes and 
   /* names/declaration pairs bound within them.  */
typedef struct scopedesc {
  struct scopedesc *container;    /* Descriptor for surrounding scope (or null)   */
  scopedBinding * bindingList;    /* Header for list of this scope's bindings     */
} scopedesc;
The bindingList field should be used as a head pointer for a list of all the bindings made in a given scope. The members of this list will be chained together through the scopeNext links found in scopedBinding structures. The container field will hold a pointer to the descriptor for the surrounding scope (or NULL if this descriptor is for the outermost class).

The specification for the type methodBinding is shown in Figure *.

/* Structure used to link together METHOD-NAMExCLASS pairs that hash to the */
/* same bucket in the hash table used to resolve qualified references to    */
/* method names.   */
typedef struct methodBinding {
  decldesc * method;              /* The descriptor for the method                      */
  decldesc * class;               /* The class that is associated with this binding     */
                                  /*     (must be a subclass of the class that contains */
                                  /*      the associated method declaration).           */
  struct methodBinding *next;     /* Next entry off the hash bucket                     */
} methodBinding;
Declaration for Type `methodBinding'
 

The method field points to the declaration descriptor associated with a given method name when that name is used to invoke a method on an object of the type described by the declaration descriptor pointed to by class. The next field should be used to implement a hash table to locate such bindings.

Declaration Descriptors

Declaration descriptors are more complex than identifier descriptors or bindings. Depending on the type of declaration involved ( a class definition, a method definition, a variable declaration, etc.) different structures must be used. Accordingly, as with tree nodes, the type used to describe declaration descriptors is a union type. The C declaration for this union type is shown in figure *.

    /* This union describes the type of all declaration descriptors */
typedef union dcldesc {
  struct unkdesc unk;

  struct methoddesc method;

  struct classdesc class;

  struct vardesc var;

} decldesc;

Definition of the type `decldesc'
 

While many distinct structure types are used as declaration descriptors they share several common fields. The declarations of these common fields are grouped in a #define named COMMONFIELDS. This #define is used to include the fields in each of the distinct structure type definitions. As in the syntax tree definitions, all declaration descriptors contain a common type field used to determine the actual format of a member of the union type. The value of this type field will be an element of the enumerated type decltype. Also, a structure type unkdesc is provided to allow one to reference the common fields of a declaration descriptor before the actual type of the descriptor involved can be determined. The declarations of COMMONFIELDS, decltype and unkdesc are shown in figure *.

    /* Enumeration type used to label the various type of declaration  */
    /* descriptors that can occur in the symbol table.                 */
typedef enum {
  methoddecl,      /* method declarations                     */
  instvardecl,     /* instance variables                      */
  locvardecl,      /* local variables                         */
  formaldecl,      /* Formal parameter names                  */
  classdecl,       /* class names (also used for int)         */
  } decltype;


    /* All declaration descriptors contain the following components */
    /* (although structure component descriptors don't use them all.) */
#define COMMONFIELDS  \
   decltype type;               /* Type of this declaration descriptor         */ \
   identdesc *ident;            /* Pointer to associated ident. descriptor     */ \
   int line;                    /* Line number at which declaration occurred.  */ \
   int deflevel;                /* nesting level of this declaration           */ \
   union dcldesc * memberlink;  /* link used to form lists of class            */ \
                                /* methods, vars, and method locals and formal */


    /* Generic structure used to access common fields of decl. descriptors. */
struct unkdesc {
   COMMONFIELDS
};
Definitions of shared features of declaration descriptors
 

The first of the common fields is the type field which holds an element of the enumeration type decltype. The field ident is used by all declaration descriptors to hold a pointer back to the identifier descriptor for the identifier associated with the declaration. The line component is used to hold the line number on which the declaration occurred. The deflevel is used to record the nesting level at which this declaration occurs.

The last component in COMMONFIELDS is memberlink. During declaration processing, this field is used as the "next" pointer for various linked lists that are used to form list of declarations that occur within a given class or method.

Type descriptors

Within the descriptors for variables and methods, it is necessary to represent information about types including return types, variable types and parameter types. Structures of type typedesc are used to do this. The specification of this type is shown in Figure *.

typedef struct typedesc {
  union dcldesc * base;     /* pointer to base type descriptor    */
  int dimensionality;       /* number of dimensions of array or 0 */
} typedesc;
Declaration of the type `typedesc'
 

In Woolite, A type is either a class, int or a (possibly multi-dimensional) array of some class or int. A type descriptor therefore stores a pointer to the declaration of the base type, base, and a count of the number of dimensions, dimensionality. A dimensionality of 0 means the type is not an array at all.

Variable and Formal Parameter Descriptors

All variable and formal parameter declarations are described using the var member of the decldesc union type. The value of the type field in such a declaration descriptor is used to distinguish the type of variable declared. The possible type values are instvardecl, localvardecl, and formaldecl. The declaration of the type used for these descriptors is shown in Figure *.

    /* Structure used for instance variable, local variable, and formal */
    /* parameter declaration descriptors.                               */
struct vardesc {
  COMMONFIELDS
  int varPosition;          /* position within containing class or method*/
  typedesc * mytype;        /* decl. descriptor for the variable's type  */
  union dcldesc *owner;     /* Desc. of class or method containing decl  */
};
Declarations for variable/field name declaration descriptors
 

There are three fields included in the declaration descriptors of variables, formals and structure components. The first is varPosition which should be set equal to the position of this variable within the list of all similar variables present in its scope. That is, the positions of instance variables, formal parameters, and local variables should be calculated independently from one another. For instance variables, the position should include not just variables declared in the same class but also any variables declared in superclasses. The second field is mytype which should be set equal to a pointer to a type descriptor for the variable or parameter type. The third is owner which should point to the declaration descriptor for the class or method in which the name was declared.

Method Descriptors

Method definitions are described using the method member of the decldesc union. The value in the type field of such a descriptor will be methoddecl. The declaration of the type used for these descriptors is shown in Figure *.

    /* Structure used for method declaration descriptors.    */
struct methoddesc {
  COMMONFIELDS
  int methodPosition;          /* Method's postion in method table       */
  union dcldesc * container;   /* Descriptor for class containing method */
  int localCount;              /* Size of space required for locals      */
  union dcldesc *locals;       /* List of all local variables            */
  int paramCount;              /* Count of parameters method expects     */
  union dcldesc * formallist;  /* head of list of this method's formals. */
  typedesc * rtntype;          /* Return type (base == NULL if void)     */
  CODELBL * entrylbl;          /* Label placed on first line of method   */
  char overrides;              /* TRUE if method overrides another       */
};
Declarations for method name declaration descriptors
 

The container field refers to the class in which the method was defined. The field localCount and locals are used to keep track of the count of and declaration descriptors for all local variable defined within the method. Similarly, paramCount and formallist are used to keep track of formal parameter declaration descriptors for the method. The rtntype field should be set to point to a type descriptor for the method's return type or to NULL if the method is declared void. The entrylbl field will be used in later phases. The overrides field is a boolean that should record whether the method overrides a method in some superclass.

Class Descriptors

Class declarations are represented using the class member of the decldesc union type. The value in the type field of such descriptors should be classdecl. The declaration of the type used for these descriptors is shown in Figure *.

    /* Structure used for class declaration descriptors.     */                                      
struct classdesc {
  COMMONFIELDS
  union dcldesc *container;    /* Descriptor for surrounding class (or null)          */ 
  union dcldesc *super;        /* Descriptor for superclass (or null)                 */ 
  int instVarCount;            /* Count of number of instance variables               */  
  union dcldesc *vars;         /* Header for list of this class' instance variables   */ 
  int methodCount;             /* Count of number of methods (including inherited)    */  
  union dcldesc *methods;      /* Header for list of this class' methods              */ 
                               /*      (not including inherited methods)              */ 
  union dcldesc *classes;      /* Header for list of nested class decls               */  
  CODELBL * methodtab;         /* label for table of method addresses                 */ 
  char methodsResolved;        /* Boolean set after processing method headers         */ 
 }; 
Declarations for class name declaration descriptors
 

The container field points to the declaration descriptor for the class in which this class declaration was lexically nested (if any). The super field points to the class this class extends (if any). The fields instVarCount and vars are used to keep track of all instance variables declared within the class. Similarly, methodCount and methods are used to keep track of method declarations found within the class. The instVarCount and methodCount variables should include both locally declared names and variables and methods included in superclasses. The vars and methods lists, however, should only include names that are declared explicitly in the associated class. The classes field is the head pointer for a list of the declarations descriptors of any nested class declarations. The methodtab field will be used by later phases. The methodsResolved field holds a boolean that is set to true as soon as bindings for the class' method headers have been created. It can then be used to enforce the rule against forward references in extends clauses.


Computer Science 434
Department of Computer Science
Williams College