how to interpret "shift/reduce" conflicts in bison

Question

I'm writing a small bison / flex calculator but I'm not able to understand and debug the verbose output
This is the code which is causing the errors:
%token NUM
%token NAME

%%
input:    'n'
          | line 'n'
;

line:     expr                    {printf("%.10gn", $1);}
          | NAME '=' expr         {Assign(table, yyname, $3);}
;

expr:     NUM                     {$$ = $1;}
          | NAME                  {$$ = Lookup(table, yyname);}
          | expr '+' expr         {$$ = $1 + $2;}
          | expr '-' expr         {$$ = $1 - $2;}
          | expr '*' expr         {$$ = $1 * $2;}
          | expr '/' expr         {$$ = $1 / $2;}
          | expr '^' expr         {$$ = pow($1, $2);}
          | '(' expr ')'          {$$ = $2;}

The verbose error output is:
state 20

'+'  shift, and go to state 13
'-'  shift, and go to state 14
'*'  shift, and go to state 15
'/'  shift, and go to state 16
'^'  shift, and go to state 17

'+'       [reduce using rule 7 (expr)]
'-'       [reduce using rule 7 (expr)]
'*'       [reduce using rule 7 (expr)]
'/'       [reduce using rule 7 (expr)]
'^'       [reduce using rule 7 (expr)]
$default  reduce using rule 7 (expr)

...

state 24

'+'  shift, and go to state 13
'-'  shift, and go to state 14
'*'  shift, and go to state 15
'/'  shift, and go to state 16
'^'  shift, and go to state 17

'+'       [reduce using rule 11 (expr)]
'-'       [reduce using rule 11 (expr)]
'*'       [reduce using rule 11 (expr)]
'/'       [reduce using rule 11 (expr)]
'^'       [reduce using rule 11 (expr)]
$default  reduce using rule 11 (expr)

I'm pretty new to using bison so any help on this would be appreciated, thanks in advance

John Bollinger · Answer

Bison operates in part by maintaining a stack of symbols, and repeatedly choosing between

shifting the next symbol from its input to the top of the stack, OR
reducing some number of symbols from the top of the stack to another symbol, according to one of the grammar rules.

It uses the current contents of the stack and the type of the next input symbol (if any), to make that choice.
It is very easy to write grammar rule sets that are ambiguous, in the sense that they afford states where either shifting or reducing could be viable.  This is a "shift/reduce" conflict, and there are many such states in your case.  Consider this input:
1 + 2 * 3

Supposing that tokens 1, 2, and 3 are parsed as NUMs, we have this:
stack                input                         action
(empty)              `NUM` `+` `NUM` `*` `NUM`     shift
`NUM`                `+` `NUM` `*` `NUM`           reduce (1)
`expr`               `+` `NUM` `*` `NUM`           shift (2)
`expr` `+`           `NUM` `*` `NUM`               shift (3)
`expr` `+` `NUM`     `*` `NUM`                     reduce (1)
`expr` `+` `expr`    `*` `NUM`                       !!

And there we have two viable alternatives: Bison could reduce the three symbols at the top of the stack to an expr, or it could shift the * onto the stack.  In fact, although it is not always the case with a shift / reduce conflict, either alternative could lead to a successful parse, but the choice matters because it can -- and in your case does -- affect the interpretation of the input.  Performing the reduction at this point leads to interpreting the input as equivalent to (1 + 2) * 3, whereas performing the shift leads, in a couple more steps, to interpreting the input as equivalent to 1 + (2 * 3).
Without other guidance, Bison defaults to shifting in these cases.  That more often leads to a successful parse, but not always, and sometimes it leads to a successful but wrong parse.  Bison's default will get it right in the above example, but would get it wrong (relative to our usual expectations for operator precedence) with 1 * 2 + 3.
Writing unambiguous grammar rules is a distinctly non-trivial exercise.  In this case, it would require breaking the symmetry of your rules for expressions, so that the left- and right-hand operands are represented by different symbols.  To take operator precedence into account, you will also need to differentiate between different types of expressions based on the top-level operator, if any, with which they are formed.
Notes:
(1) There is no rule for handling a NUM except to reduce it to an expr.
(2) an expr could also be reduced to a line, but Bison can take the next token type into account to see that performing such a reduction would leave it in a state from which it could not proceed.
(3) There is no rule to reduce a group of symbols ending with a +.

zwol · Answer

Chris Dodd's answer tells you what is wrong.  This answer is about how to fix it.
You can fix this by introducing a bunch of sub-productions of the expr production, and this is what you will often see in formal specifications, e.g.
atom:   NUM | NAME | '(' expr ')'
pow:    atom '^' atom
muldiv: pow '*' pow | pow '/' pow
/* etc */

And with more complicated grammars, sometimes you have to do it this way.  But in this case, it's easier to use Bison's operator precedence feature.  Add these lines to the first section of your .y file:
%left '+' '-'
%left '*' '/'
%left '^'

This tells the parser generator that all of the arithmetic operators are left-associative (that is, treat a + b + c as (a + b) + c), that * and / have higher precedence than + and - (that is, treat a + b * c as a + (b * c)), and that ^ has higher precedence still.  This is enough extra information to eliminate all the conflicts.

Chris Dodd · Answer

A "conflict" in bison tells you the grammar in question is not LALR(1) -- there is something about it that can't find a unique parse for an input with only 1 token lookahead.
The verbose grammar file tells you exactly where the conflict is, but to understanbd it, you need to understand how a shift-reduce parser recognizes inputs.  A shift-reduce parser is basically a state machine coupled to a stack (a push-down automata or PDA), where the state machine tracks which possible (partial) rule productions have been recognized so far, and the stack holds states that have been deferred.
In your specific example, you have the state rules:
7 expr: expr . '+' expr
7     | expr '+' expr .
8     | expr . '-' expr
9     | expr . '*' expr
10     | expr . '/' expr
11     | expr . '^' expr

which tells you the state represents having partially recognized any of those rules up to the point where a . appears.  So it might have recognized the entire rule expr + expr or it might be just the initial expr of one of the other rules.  This is indicative of an ambiguity in the grammar.  Whenever you have an input of the form
A + B * C

or
3 + 5 - 7

it can recognize it as two binary operations, but it doesn't know which one binds tighter -- should it be (A + B) * C or A + (B * C)

how to interpret "shift/reduce" conflicts in bison

3 Answers

Add your own answers!

Ask a Question