A Word About LALR(1) Parsing

A Word About LALR(1) Parsing

We've got a problem...
- The machines that result from the LR(1) construction tend to be too large to use in practice.
- The SLR(1) technique is too weak (i.e. there are too many grammars that could be parsed deterministically using LR(1) techniques whose LR(0) machines still have SLR(1) conflicts).
LALR parsing attempts to find a happy medium.
Given a set of LR(1) items, we can obtain a set of LR(0) items by dropping the lookahead from each LR(1) item. We call this set the core of the original set of LR(1) items.
- As an example, the core of the state in the LR(1) machine reached upon reading "(S" is just the state of the LR(0) machine reached on the same input.
The core of each state in the LR(1) machine will correspond to some state in the LR(0) machine. (Many states in the LR(1) machine may have the same state of the LR(0) machine as their core.)
Using these facts, for each state in the original LR(0) machine for a grammar we can define a corresponding set of LR(1) items. For an LR(0) state , we will use the set of LR(1) items defined by
{ [ N ₁ . ₂ , x ] | for some ' in the LR(1) machine, [ N ₁ . ₂ , x ] ' & = core( ' ) }
The LALR(1) machine for a grammar is formed by replacing each of the sets of LR(0) items associated with the states of the LR(0) machine with sets of LR(1) items in the way just described.
Consider how this works on the grammar:

S id = A !

A id = A

| E

E id

| ( id ! )

| ( A )
The mess below is the LR(0) machine for the grammar. Note that two states (8 and 9) contain LR(0) conflicts and that one of them (9) is also an SLR(1) conflict since and E can be followed by either an ")" or an exclamation point.
The even bigger mess below is (hopefully) the LR(1) machine for the grammar. The following handy guide lists the states of the LR(0) machine and the states of the LR(1) machine to which they correspond.
0
0
1
1
2
2
3
3
4
4, 15
5
5, 13
6
6, 12
7
7
8
8, 23
9
9, 18
10
10, 16
11
11, 17
12
14, 22
13
20, 21
14
24, 25
Finally, here we see the LALR(1) machine for this grammar. The states are numbered to match the numbering of the states from the LR(0) machine. Each state contains the union of the LR(1) items found in the LR(1) states whose core is equivalent to the corresponding LR(0) state.
Note, there are no LALR(1) conflicts.

Computer Science 434
Department of Computer Science
Williams College

A Word About LALR(1) Parsing