CS 334
Programming Languages
Spring 2002

Assignment 3
Due Thursday, 2/28/02

  1. The following grammar is motivated by declarations in C:
        <Declaration> ::= <Type> <Declarator>
               <Type> ::= int | char
         <Declarator> ::= '*' <Declarator>
                        | <Declarator> '[' number ']'
    		    | <Declarator> '(' <type> ')'
    		    | '(' <Declarator> ')'
    		    | name

    a. Prove the syntactic ambiguity of this grammar by finding a string that has two distinct parse trees. Draw the parse trees.

    b. The constructs '[' number ']' and '(' ')' can be thought of as being postfix operators with a declarator as an operand. Suppose that '*' has lower precedence than these operators. Write an unambiguous grammar that generates the same strings as this grammar and makes it easy to identify the components of a declarator.

    c. Suppose that the first production for Declarator is changed to make '*' a postfix operator (i.e., it goes after the Declarator rather than in front of it). Why is the resulting grammar unambiguous?

  2. Two famous phrases used as examples in linguistics are "Time flies like an arrow" and "Fruit flies like a banana." (I believe they are due to Noam Chomsky). Please generate plausible syntax rules for English that would allow you to parse both of these sentences. The point of the examples is to indicate the difficulty in parsing (and understanding) natural languages. Please explain this difficulty.

  3. In file parseEnglish.ml you can find the skeleton of a parser for the English grammar given on page Ch4-5 of Louden. Please replace the "??" by real ML code so that

       parsestr sentence;
    returns true if and only if sentence is a string generated by the grammar.

    Function lexstr converts a string into a list of strings, where each of the strings in the list corresponds to a word in the sentence. The function parse takes a list of strings representing words and returns true iff the list corresponds to a sentence according to the grammar. Notice that this is different from the parsers looked at in class where we returned parse trees. This is simpler!.

    The functions parseSentence, parseVerbPhrase, etc take a list of tokens as argument and return a pair consisting of a boolean (indicating whether it actually found the kind of phrase) and the remaining unused tokens. Thus parseNounPhrase ["the","girl","sees","a","dog"] would return the pair (true,["sees","a","dog"]).

    I have written parseArticle and parseSentence for you. This should be sufficient to help you write the other three. Just remember that your calls to parseX functions should exactly follow the grammar.

  4. In this problem, I want you to begin to design an ML interpreter for a simple functional language. Our language is relatively simple, but more sophisticated than the arithmetic expressions of last week since it involves functions. The expressions are written in the language given by the following simple BNF grammar.

       e ::= x | n | true | false | succ | pred | iszero | 
         if e then e else e | (fn x => e) | (e e) | rec x => e
    In the above, "x" is a variable, "n" stands for an integer, "true" and "false" are the truth values, "succ" and "pred" are unary functions which either add or subtract 1 from its arguments, "iszero" is a unary function which returns "true" if its argument is 0 and "false" otherwise, "if...else..." is a conditional expression, "fn x => e" is a function with formal parameter "x" and body "e", and "(e e)" represents function application. (Don't worry about "rec x => e" for now! It is used for defining recursive functions.)

    As in last week's assignment, we will presume that we have a parser which parses input into an abstract syntax tree, which your interpreter should use. The definition of the ML datatype is

       datatype term = AST_ID of string | AST_NUM of int | AST_BOOL of bool
                     | AST_SUCC | AST_PRED | AST_ISZERO
                     | AST_IF of (term * term * term) | AST_ERROR of string
                     | AST_FUN of (string * term) | AST_APP of (term * term)
                     | AST_REC of (string *term) 
    As before, this definition mirrors the BNF grammar given above; for instance, the constructor AST_ID makes a string into an identifier or variable, and the constructor AST_FUN makes a string representing the formal parameter and a term representing the body of the function into a function. Interpreting abstract syntax trees is much easier than trying to interpret terms directly.

    You are to write an ML function interp that takes an abstract syntax tree representing a term and returns the result of evaluating it, which will also be an abstract syntax tree. The reduction should be done according to the rules given below. The expression "e => v" means that the term "e" evaluates to "v" (and then can be evaluated no further). The rules below are written for the expressions in the original grammar. Your program should be written for the equivalent expressions using the abstract syntax trees (elements of type "term").

    The base cases are:

    (1) n => n for n an integer.

    (2) true => true, and similarly for false

    (3) error => error

    (4) succ => succ, and similarly for the other initial functions

    The other cases are slightly more complicated. They are written in the form of a rule in the manner of the following example:

             b => true         e1 => v
        (5)	---------------------------
             if b then e1 else e2 => v
    We read the rule from the bottom up: if the expression is an if-then-else with components b, e1, and e2, and b evaluates to true and e1 returns v, then the entire expression returns v. Of course, we also have the symmetric rule
             b => false        e2 => v
        (6) ----------------------------
             if b then e1 else e2 => v
    The following are some of the cases for applications:
             e1 => succ        e2 => n
        (7) ----------------------------
                 (e1 e2) => (n+1)
             e1 => pred        e2 => 0       e1 =>pred    e2 => (n+1)
        (8) ---------------------------     --------------------------
                 (e1 e2) => 0                      (e1 e2) => n
             e1 => iszero   e2 => 0          e1 =>iszero   e2 => (n+1)
        (9)	------------------------        ---------------------------
                (e1 e2) => true                    (e1 e2) => false
    Here is a simple example using these rules: Evaluate
       if (iszero 0) then 1 else 2
    According to rules 5 and 6, we must first evaluate (iszero 0). By rule (9), this evaluates to true. Now by rule (5) (and the fact that 1 => 1 via rule 1), this evaluates to 1.

    a. Use these rules to write an interpreter, interp: term -> term, for the subset of the language which does not include terms of the form AST_ID, AST_FUN, or AST_REC. If your interpreter tries to evaluate these three types of expressions, it should return the error, AST_ERROR.

    Note: The file, parsePCF.ml, contains an ML program that parses strings or files containing an expression from the simple BNF grammar given above into an expression using the AST terms. Thus, if you use "parserPCF.ml"; and then write succ 7, the system will return AST_APP(AST_SUCC, AST_NUM 7). Similarly, if you have a file "foo.pcf" containing succ 7, parsefile "foo.pcf" returns AST_APP(AST_SUCC, AST_NUM 7). Feel free to use these functions to generate abstract syntax trees, which is much easier than typing in the long AST terms directly.

    I have also provided you with the skeleton of the interpreter in a file called PCF.interp.student.sml, which also contains brief explanations and examples.

    b. The notation e[x := v] indicates the textual substitution of v for all free occurrences of x in e. For example, (succ x) [x:=1] is the expression (succ 1). Please write an ML function subst that takes a term, t, a string representing a variable, v, and a term, s, and returns t with all free occurrences of v (actually AST_ID v) replaced by s. Thus, the function application (corresponding to (succ x) [x:=1], above),

            subst (AST_APP(AST_SUCC, AST_ID "x")) "x" (AST_NUM 1)
    gives the answer
            AST_APP(AST_SUCC, AST_NUM 1)

    Do not substitute in for bound occurrences of variables. I.e., substituting 3 for x in (x + ((fn x => 2+x) 8)) should result in (3 + ((fn x => 2+x) 8)). The formal parameter x, and its occurrences in the body of the function are not affected by the substitution because of the static scoping rules.

    Hint: Just as in part a, use pattern-matching on each constructor of the abstract syntax tree, calling subst recursively when you need to.

    c. Using your substitution function, extend your interp function from part a to include AST_FUN terms. The reduction for the terms involving AST_FUN should be done according to the rules given below:

    Functions by themselves don't do anything (just like succ and pred above)

        (10)	(fn x => e) => (fn x => e)	  
    Computations occur when you apply these functions to arguments. The next rule defines call-by-value function application, as in ML. If the function is of the form fn x => e, evaluate the operand to a value, v1, substitute v1 in for the formal parameter in e, and then evaluate the modified body:
                e1 => (fn x => e3)      e2 => v1    e3[x:=v1] => v
        (11)    --------------------------------------------------------
                                  (e1 e2) => v
    For instance, in evaluating the application
    		((fn x => (succ x)) (succ 0))
    we first note that the functions is already full evaluated, so we evaluate (succ 0) to 1, and then plug this in for x in the body, (succ x), of the function, obtaining (succ 1), which evaluates to 2.

    Notice that while terms of the form (AST_ID s) can appear whenever s is a formal parameter, we never need to evaluate terms of the form (AST_ID s), because they are always replaced by the subst function before we evaluate the body of the function.

    Any variables which have not yet been replaced by other terms at the time of evaluation represent unbound variables (those not introduced as formal parameters). You should return AST_ERROR is your interpreter is applied to a term of that form.

    d. Surprisingly enough, evaluating recursive terms is pretty trivial. First let's talk about what a term of the form rec x => e actually means. It corresponds to the definition of a recursive function called x. Let's work with an example. The term

       rec sum => fn x => fn y => if (iszero x) then y else sum (pred x) (succ y)
    represents the following equivalent recursive function definition:
       sum x y = if x = 0 then y else (x - 1) + (y + 1)
    Thus the variable immediately after is the name of the recursive function, while everything after the => is the body of the function being defined. We use a strange notation like this because we do not have a mechanism in this simple language to give a name to values or functions. However, the idea is pretty straightforward. In the definition, rec x => e, all occurrences of x in the body e stand for the entire function.

    The rules for evaluating a recursive term are pretty simple. Just evaluate the body of the term, where all occurrences of the recursively defined variable are replaced by the entire rec term.

                      e[x:=rec x => e] => v
        (12)         --------------------------
                        (rec x => e) => v
    Thus we can evaluate:
       rec sum => fn x => fn y => if (iszero x) then y else sum (pred x) (succ y)
    by replacing it with
       fn x => fn y => if (iszero x) then y else sum' (pred x) (succ y)
    where sum' abbreviates the entire expression above which begins with rec sum ....


    1. While this program is broken up into four parts in order to help you make progress through it, you may turn in a single interpreter function which interprets the entire language (though don't forget to turn in subst as well).

    2. The parser I've provided you with actually parses a slightly richer language than I've presented here. In particular, it allows you to write terms of the form:
         let x = E1 in E2 end
      For example,
         let z = 2 in succ z end
      is recognized by the parser. This makes it much easier to write interesting terms of the language to evaluate.

      Rather than complicating the interpreter, the parser translates let clauses as if they were function applications. Thus the general form gets translated as if it were

         (fn x => E2) E1
      A moment's thought will show you that this has exactly the same meaning as the let clause. Thus the example above will be parsed as:
         AST_APP (AST_FUN ("z",AST_APP (AST_SUCC,AST_ID "z")),AST_NUM 2)
      Thus you may use let clauses in creating examples to test your interpreter, without having to worry about including a new clause in the interpreter.

    3. I have included in the same directory as the parser and the starter version of your program a few simple pcf programs. Their names are of the form "test.xn.pcf" where "x" is the relevant section of the problem ("a"-"d") and "n" is a number to distinguish different test programs for the same section. These examples are not enough to test your interpreter thoroughly, but are instead there as examples of correct syntax.

    4. You need not worry about how the parser.sml program works (though feel free to take a look to see how a lexical scanner and parser works). However, I would like to point out to you that the beginning of the file contains the command:
         Compiler.Control.Print.printDepth := 100;
      This assignment statement tells ML to print longer answers. Normally ML truncates your answer at around 20 characters, inserting a hash symbol (#) to indicate where it has elided parts of the answer. This command, which updates a variable in a module, tells the system to print longer answers.

    Back to:
  5. CS 334 home page
  6. Kim Bruce's home page
  7. CS Department home page
  8. kim@cs.williams.edu