TopParsing: The Problem of Finding a Derivation Top-down Parsing

Top-down Parsing

  1. At any step in the process of a top-down parse, one has a sentential form that one wishes to re-write so that it more closely matches the target string (). At each such step one must make two choices:
    1. which of the non-terminals in the current sentential form to replace (left-most to support left-to-right processing of input).
    2. which production to apply to the selected non-terminal.
  2. To eliminate the evil influence of intuition, let's consider finding a derivation given a not very meaningful grammar:.

    < S > a < R >  |  b < S > b < R >
    < R > b < R >  |  a

  3. Consider the process of using a top-down parser to find a derivation for 'bbaababa' relative to the grammar given above.
    Matched terminals Tail of sentential form Pending input
    < S > bbaababa
    b < S > b < R > baababa
    bb < S > b < R > b < R > aababa
    bba < R > b < R > b < R > ababa
    bbaab < R > b < R > aba
    bbaabab < R > a
    bbaababa
  4. Note that:
    1. If we concatenate "Matched" and "Tail of Sentential Form" we always obtain a complete sentential form of the grammar,
    2. the "Tail of Sentential Form" column behaves like a stack.
    A top down parser can be implemented by explicitly maintaining this stack as a data structure.
  5. To make our parse "deterministic", we want to decide how to expand the first terminal on the stack based only on what we have matched so far and on some finite prefix of the remaining input. If this is possible using a prefix of lenght k, we say that the grammar is LL(k).
  6. In many cases, this is not possible for any k.
  7. For most languages, however, we can find a grammar in which one can determine which production to use next by just looking at the first unmatched character. Such a grammar is called an LL(1) grammar.
  8. The following grammar:

    < stmt > if < expr > then < stmt > < iftail >
    < iftail > else < stmt > end
     |  end

    is obviously LL(1) because:
    1. The right hand side of each production begins with a terminal, and
    2. if two productions have the same left hand side, then their right hand sides begin with different terminal symbols.

    A grammar with these two properties is said to be an S-grammar. Any S-grammar is LL(1).

  9. In the case of top-down parsing, this assumption of a determinisitic parse produced using a single scan through the input leads to the production of left-most derivations.
  10. One of the attractions of top down parsing is that there is a simple scheme for implementing a top down parser in any language that supports recursion. The following procedure skeletons show how such a "recursive descent" parser for the S-grammar:

    < S > a < R >  |  b < S > b < R >
    < R > b < R >  |  a

    would look (it assumes that "ch" holds the next input character to be processed):
    
    procedure R;
        if ch = 'b' then
             getnextchar;
             R;
        else if ch = 'a' then
             getnextchar;
        else 
             error
        end
    end R;
    
    procedure S;
        if ch = 'a' then
             getnextchar;
             R;
        else if ch = 'b' then
             getnextchar;
             S;
             if ch = 'b' then
                 getnextchar;
             else
                 error;
             end;
             R;
        end
    end R
    

  11. One of the nice things about recursive descent parsing is that you can "massage" the code instead of the grammar.

Computer Science 434
Department of Computer Science
Williams College

TopParsing: The Problem of Finding a Derivation Top-down Parsing