Martin Sulzmann
Bottom-up parsing
Scan input from left
Recognize right-hand side of production rules
Shift/reduce operations
Let’s build the LR(0) item set for the following grammar.
S -> E
E -> id
E -> (E)
E -> E+E
E -> E*E
We start with
I0 : { S -> .E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
Recall that the . indicates how much of a right-hand
side we have already recognized. If the . hits a
non-terminal, we pull in the respective rules for this non-terminal.
We build item sets by checking which ‘symbols’ can be consumed. A symbol can either be a terminal or a non-terminal (symbol, here you go we overload the term symbol).
Starting from I0, possible symbols to be consumed are:
id, (, E. This results into the
following three item sets.
consume id
I1: { E -> id. }
consume (
I2: { E -> (.E), E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
consume E
I3: { S -> E., E -> E.+E, E -> E.*E }
Observations:
We seem to have a shift/reduce conflict for I3:
S -> E. versus E -> E.+E. However, we
can resolve this conflict as the reduction S -> E. only
applies in case we hit the EOF symbol.
The process of “shifting” the . across the
right-hand side resembles the derivative operation we have seen in the
context of regular expressions.
Let’s continue to build the remaining item sets and transitions. We
write StartItem - symbol -> NextItem to indicate the
resulting transitions of the FSA obtained by building item sets.
I0 - id -> I1
I0 - ( -> I2
I0 - E -> I3
I2 - id -> I1
I2 - ( -> I2
I2 - E -> I4
I3 - + -> I5
I3 - * -> I6
I4 - ) -> I7
I4 - + -> I5
I4 - * -> I6
I5 - id -> I1
I5 - ( -> I2
I5 - E -> I7
I6 - id -> I1
I6 - ( -> I2
I6 - E -> I8
I7 - + -> I5
I7 - * -> I6
I8 - + -> I5
I8 - * -> I6
I0 : { S -> .E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
// I0 - id -> I1
I1: { E -> id. }
// I0 - ( -> I2
I2: { E -> (.E), E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
// I0 - E -> I3
I3: { S -> E., E -> E.+E, E -> E.*E }
// I2 - E -> I4
I4 : { E -> (E.), E -> E.+E, E -> E.*E }
// I3 - + -> I5
I5 : { E -> E+.E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
// I3 - * -> I6
I6 : { E -> E*.E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }
// I5 - E -> I7
I7: { E -> E+E., E -> E.+E, E -> E.*E }
// I6 - E -> I8
I8: { E -> E*E., E -> E.+E, E -> E.*E }
Observations.
We encounter shift/reduce conflicts in I7 and
I8.
This is no surprise as we know that this grammar is unambiguous!
How to resolve these conflicts?
S -> E
E -> T+E
E -> T
T -> F*T
T -> F
F -> id
F -> (E)
* has precdence over +
Both operations are left-associative
If we are in state 9 (item set I7) and lookahead is
+, then reduce (left-associative). If lookahead is
*, then shift (precedence).
If we are in state 10 (item set I8) and lookahead is
*, then reduce (left-associative). If lookahead is
+, then reduce again (precedence).
This special case can be deal with in yacc by specifying
that * has a higher precedence than +.
Recall
Stmt -> if E then Stmt else Stmt
Stmt -> if E then Stmt
Stmt -> other
This grammar is ambiguous. For example, consider
if expr then if expr then other else other
Disambiguation: “Match closest then”.
Stmt -> Match
Stmt -> Unmatch
Stmt -> other
Match -> if E then Match else Match
Match -> other
Unmatch -> if E then Stmt
Unmatch -> if E then Match else Unmatch
This grammar should be accepted by yacc, hence, belongs to the class LALR(1).