LR Parsing - Examples

Martin Sulzmann

Highlights

Bottom-up parsing
Scan input from left
Recognize right-hand side of production rules
Shift/reduce operations

Example 1 - Arithmetic expressions

Let’s build the LR(0) item set for the following grammar.

S -> E
E -> id
E -> (E)
E -> E+E
E -> E*E

We start with

I0 : { S -> .E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

Recall that the . indicates how much of a right-hand side we have already recognized. If the . hits a non-terminal, we pull in the respective rules for this non-terminal.

We build item sets by checking which ‘symbols’ can be consumed. A symbol can either be a terminal or a non-terminal (symbol, here you go we overload the term symbol).

Starting from I0, possible symbols to be consumed are: id, (, E. This results into the following three item sets.

consume id
I1: { E -> id. }

consume (
I2: { E -> (.E), E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

consume E
I3: { S -> E., E -> E.+E, E -> E.*E }

Observations:

We seem to have a shift/reduce conflict for I3: S -> E. versus E -> E.+E. However, we can resolve this conflict as the reduction S -> E. only applies in case we hit the EOF symbol.
The process of “shifting” the . across the right-hand side resembles the derivative operation we have seen in the context of regular expressions.

Let’s continue to build the remaining item sets and transitions. We write StartItem - symbol -> NextItem to indicate the resulting transitions of the FSA obtained by building item sets.

I0 - id -> I1
I0 - ( ->  I2
I0 - E ->  I3


I2 - id -> I1
I2 - ( -> I2
I2 - E -> I4

I3 - + -> I5
I3 - * -> I6

I4 - ) -> I7
I4 - + -> I5
I4 - * -> I6

I5 - id -> I1
I5 - (  -> I2
I5 - E  -> I7

I6 - id -> I1
I6 - (  -> I2
I6 - E  -> I8

I7 - +  -> I5
I7 - *  -> I6

I8 - +  -> I5
I8 - *  -> I6


I0 : { S -> .E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

// I0 - id -> I1
I1: { E -> id. }

// I0 - ( -> I2
I2: { E -> (.E), E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

// I0 - E -> I3
I3: { S -> E., E -> E.+E, E -> E.*E }

// I2 - E -> I4
I4 : { E -> (E.), E -> E.+E, E -> E.*E }

// I3 - + -> I5
I5 : { E -> E+.E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

// I3 - * -> I6
I6 : { E -> E*.E, E -> .id, E -> .(E), E -> .E+E, E -> .E*E }

// I5 - E -> I7
I7: { E -> E+E., E -> E.+E, E -> E.*E }

// I6 - E -> I8
I8: { E -> E*E., E -> E.+E, E -> E.*E }

Observations.

We encounter shift/reduce conflicts in I7 and I8.
This is no surprise as we know that this grammar is unambiguous!

How to resolve these conflicts?

Rewrite into an unambiguous grammar!

S -> E
E -> T+E
E -> T
T -> F*T
T -> F
F -> id
F -> (E)

* has precdence over +
Both operations are left-associative

Customize LR parser as follows:

If we are in state 9 (item set I7) and lookahead is +, then reduce (left-associative). If lookahead is *, then shift (precedence).
If we are in state 10 (item set I8) and lookahead is *, then reduce (left-associative). If lookahead is +, then reduce again (precedence).

This special case can be deal with in yacc by specifying that * has a higher precedence than +.

Example 2 - Dangling else

Recall

Stmt -> if E then Stmt else Stmt
Stmt -> if E then Stmt
Stmt -> other

This grammar is ambiguous. For example, consider

if expr then if expr then other else other

Disambiguation: “Match closest then”.

Stmt -> Match
Stmt -> Unmatch
Stmt -> other

Match -> if E then Match else Match
Match -> other

Unmatch -> if E then Stmt
Unmatch -> if E then Match else Unmatch

This grammar should be accepted by yacc, hence, belongs to the class LALR(1).