TopYACC - a final wordHow Lex Works

How Lex Works

  1. Let's try to use all the good things you learned about regular languages and finite state machines to try to figure out how a program like Lex works. (We will save figuring Yacc out for after break).
  2. Lex starts with a collection of regular expressions.
  3. First, recognize that there are a number of things we can do to reduce the apparent complexity of the language of regular expressions before we even begin.
  4. Next, recall the structure of a deterministic finite state machine.
    1. A finite set of states, .
    2. An input alphabet, .
    3. A transition function : x -> .
    4. A subset F of called the set of final states.
    5. An element 0 of called the initial state.
  5. Although what I have shown above is the standard way to define a FSA, it is worth noting that there is one oddly unmathematical aspect of this definition. The transition function is undefined on many elements in its domain. When it is helpful, we can eliminate this oddity by adding an error state with the idea that in any case where would have been undefined we will define its result to be the error state.
  6. While you are at it, recall (or at least note) that we can explain the behavior of a deterministic finite state machine by defining a function that extends to strings over the input alphabet. In particular, we can define : x * -> recursively as and then state that the language accepted by the machine is
    { *  | ( 0, ) F }
  7. Building a FSA to match a single symbol x is quite easy.
  8. Building an FSA to match an RE of the form (i.e. concatenation) is a bit trickier.
  9. We can simplify this issue by giving up one (very key) property of the automaton we have been building, its deterministic behavior. That is, if we allow ourselves to build a non-deterministic finite state machine instead of deterministic one, the construction becomes much easier.
  10. So, now consider how to define a non-deterministic finite state machine:
    1. A finite set of states, .
    2. An input alphabet, .
    3. A transition function : x (+ ) -> 2.
    4. A subset F of called the set of final states.
    5. An element 0 of called the initial state.
  11. As we did for FSA, we would like to define an extension of to a function that handles transitions on strings rather than single symbols from .
  12. Working with NFSAs also makes it easy to construct a machine for a RE of the form *.
  13. Now we can build a complete NFSA for each RE provided in a Lex input file. This leaves us with two problems.
  14. To be continued.... ?????

Computer Science 434
Department of Computer Science
Williams College

TopYACC - a final wordHow Lex Works