TopUnderstanding Block StructureSymbol Tables vs. Symbol Table Organization

Symbol Tables vs. Symbol Table Organization

  1. The organization of most compiler symbol tables is fairly complex as a result of the need to support the scope rules associated with block structure. The standard approach to explaining symbol table organization, however, adds additional complexity by failing to properly distinguish the role of the scanner in building the symbol table from that of the semantic analysis routines in completing it.

  2. Many compiler texts describe the symbol table as a dictionary, typically implemented using a hash table or a search tree.

  3. In my (somewhat odd) view, the symbol table is just a collection of records/structures in which the attributes of identifiers are stored. The hash table (or whatever else is used) is simply a mechanism that enables the scanner to associate symbol table entries with the character string form of identifiers it processes.

  4. Once the program has been transformed into a syntax tree, the semantic analyzer has direct access to symbol table entries through pointers stored in the tree. The hash table used by the scanner is not needed. Thus, it seems wrong to me to talk about the hash table as if it were the symbol table.

  5. The key to this approach is to start by thinking about the abstract entities that need to be represented in the symbol table.
  6. The syntax tree produced by the syntactic analyzer represents each identifier occurrence by a pointer to the corresponding identifier descriptor. Unfortunately, when processing the syntax tree to do code generation or type checking, it would be much more useful if each identifier in the program was represented in the tree by a pointer to the appropriate declaration descriptor.

    As a result, the first step in semantic processing will be to process the syntax tree creating the required declaration descriptors and replacing/augmenting the pointers to identifier descriptors in the tree by pointers to the appropriate declaration descriptors. We will call these processes declaration processing and identifier reference resolution.

  7. To produce an efficient compiler, we want to avoid algorithms that require either time or space that is more than linear in the size of the program whenever possible. This implies that we need way to find the correct declaration descriptor for each identifier in our syntax tree in constant time!

  8. A way we can arrange for such constant time processing is to ensure that when we encounter a reference to an identifier in the tree, the identifier descriptor for the identifier already points to a binding descriptor that in turn points to the the correct (i.e. current) declaration of that identifier.
  9. As the semantic processing routines traverse the syntax tree, the bindings associated with a given identifier behave in a stack-like manner.
  10. The stack of bindings for a given identifier can be kept as a linked list with the head pointer stored in the associated identifier descriptor.
  11. If such stacks are maintained, it is easy to replace/augment each identifier descriptor pointer encountered while traversing the syntax tree.
  12. It is easy to push the necessary binding descriptor onto the appropriate identifier descriptor's stack for each identifier declared within a scope.
  13. Semantic processing of the end of a scope requires removal of all declaration descriptors on the current scope's list from the stacks of declaration descriptors attached to the identifier descriptors involved.
  14. To summarize the process suggested above (and to make it more algorithm-like:
    1. The scanner creates a new identifier descriptor each time it sees an identifier it has not previously seen. It uses a hash table to keep track of the descriptors it has already created.
    2. The semantic processor creates a declaration descriptor each time it encounters a declaration.
    3. The semantic processor maintains a stack of binding descriptors for each identifier.
      • Each identifier's stack is pointed to by the identifier's descriptor.
      • The stack contains a binding descriptor for each declaration of the identifier associated with a scope that is currently "open" (i.e. that the semantic processing routines have started to work on but not yet completely finished).
      • The entry for the innermost scope is kept at the top of the stack. The outermost scope is at the bottom of the stack.
    4. The semantic processor maintains a list of the bindings encountered in each open scope. We will call these lists scope binding lists. These lists are then organized in a stack call the open scope stack with the scope declaration list for the innermost scope at the top of the stack.
    5. To process a scope (i.e. class, procedure, main program, etc.) the semantic processor
      • Pushes an empty binding list on the open scope stack.
      • For each declaration, creates a declaration descriptor, pushes a binding referring to this declaration onto the identifier's binding stack, and adds it to the topmost scope binding list on the open scope stack.
      • Scans the contents (statement, expressions, etc.) of the block replacing each pointer to an identifier descriptor by a pointer to the declaration descriptor that is currently pointed to by the binding on the top of that identifier descriptor's active binding stack.
      • Closes the scope by popping a binding descriptor from the stack of every identifier descriptor on the topmost scope binding list and then pops this list off the open scope stack.
  15. To understand how this all works together, consider the program shown in Figure *.

    class Program {
    int W;
    int X;
    class A {
    int W;
    int Y;
    int Z;
    void B() { int X ; . . . }
    void C() { int X; int Y ; . . . }
    . . .
    } // end of A
    void D() { int Z ; . . . }
    }

    A class definition skeleton illustrating nested declarations
     

    Symbol Table Organization
     

Computer Science 434
Department of Computer Science
Williams College

TopUnderstanding Block StructureSymbol Tables vs. Symbol Table Organization