Parse Trees and AmbiguityTopInformal Formal GrammarsFormal Grammars

Formal Grammars

  1. Context free grammars are a notation for describing sets of strings (each phrase type is really just the name of a set of strings). So, we start our formal study of grammars with some basic definitions concerning strings and sets of strings:
    Alphabet
    A finite, non-empty set of symbols.
    String
    A string over some alphabet () is a finite, possibly empty sequence of symbols from . We will use eps to denote the empty string.
    Language
    A language over an alphabet is just a set of strings over that alphabet.
    Concatenation
    If x and y are two strings over , then their concatenation, xy, is the sequence of characters obtained by placing the sequence of characters in x before the sequence y. If z = xy is a string, we say that x is a prefix of z and y is a suffix of z.
    Products
    If X and Y are languages over some alphabet, then their product, XY, is defined to be:
    { xy | x X and    y y }
    Powers
    If X is a language over we define Xn to be the language containing only the empty string if n = 0 and XXn-1 otherwise.
    Closures
    If X is a language over we define X+, the positive closure of X, to be the union of the sets X1, X2, X3, . . . and X*, the closure of X, to be the union of X0 and X+.
  2. Now, the definition you have all been waiting for:
    Context-free Grammar
    A context free grammar is composed of:
    1. A finite alphabet Vt called the terminal symbols.
    2. A finite alphabet Vn called the non-terminal symbols.
    3. A distinguished element of the set Vn denoted by the symbol S and referred to as the goal symbol or start symbol.
    4. A finite set P of pairs composed of one element from Vn and one element from (Vt U Vn)* called productions. Productions are written in the form:
      A X1 X2 . . . Xm
  3. There is an interesting, alternate approach to interpreting the productions of a grammar. Rather than viewing them as rules for producing strings that belong to the language defined, we can view them as set inequalities over a set of variables composed of the non-terminal symbols.
  4. The association between a context free grammar and the language it describes is formalized through the notion of a derivation:
    Direct derivation
    Given a grammar, G = (Vt , Vn , S, P), and two strings x and y in (Vt U Vn)* such that x = A and y = where , , (Vt U Vn)* and ( A, ) P we say that x directly derives y. In this case we write
    x y
  5. Examples of direct derivations.

    Consider G =

    < blob > x < glob > < blob > y
    < blob > z
    < glob > a < glob >
    < glob > eps

  6. More on the notion of a derivation:
    Derivation
    Given a grammar, G = (Vt , Vn , S, P), and two strings x and y in (Vt U Vn)* we say that x derives y if there exists a sequence of string 0, 1, 2, ... ,m all in (Vt U Vn)* such that
    1. for all i < m, i i+1 ,
    2. x = 0 , and
    3. y = m .
    In this case we write
    x y
    The sequence 0, 1, 2, ... ,m is called a derivation of length m of y from x.
  7. Using the grammar G shown above we can say that that < blob > xaxzyy since:
  8. Time for more definitions:
    Sentential form
    Given a grammar, G, a string is called a sentential form of G if it is derivable from the start symbol of G.
    Sentence
    A sentential form containing only symbols from the terminal vocabulary of a language is called a sentence.
    L(G)
    The language defined by a grammar G is the set of all sentences.
    L(G) = { s | S s  &  s Vt* }

Computer Science 434
Department of Computer Science
Williams College

Parse Trees and AmbiguityTopInformal Formal GrammarsFormal Grammars