This section explains how the tokenized input from Part 1 is processed - it is done using Context Free Grammars (CFGs). The grammar must be specified, and the tokens are processed according to the grammar. Under the hood, the parser uses an LALR parser.
# Yacc example import ply.yacc as yacc # Get the token map from the lexer. This is required. from calclex import tokens def p_expression_plus(p): 'expression : expression PLUS term' p = p + p def p_expression_minus(p): 'expression : expression MINUS term' p = p - p def p_expression_term(p): 'expression : term' p = p def p_term_times(p): 'term : term TIMES factor' p = p * p def p_term_div(p): 'term : term DIVIDE factor' p = p / p def p_term_factor(p): 'term : factor' p = p def p_factor_num(p): 'factor : NUMBER' p = p def p_factor_expr(p): 'factor : LPAREN expression RPAREN' p = p # Error rule for syntax errors def p_error(p): print("Syntax error in input!") # Build the parser parser = yacc.yacc() while True: try: s = raw_input('calc > ') except EOFError: break if not s: continue result = parser.parse(s) print(result)
Each grammar rule is defined by a function where the docstring to that function contains the appropriate context-free grammar specification. The statements that make up the function body implement the semantic actions of the rule. Each function accepts a single argument p that is a sequence containing the values of each grammar symbol in the corresponding rule. The values of
p[i] are mapped to grammar symbols as shown here:
def p_expression_plus(p): 'expression : expression PLUS term' # ^ ^ ^ ^ # p p p p p = p + p
For tokens, the "value" of the corresponding
p[i] is the same as the
p.value attribute assigned in the lexer module. So,
PLUS will have the value
For non-terminals, the value is determined by whatever is placed in
p. If nothing is placed, the value is None.
p[-1] is not the same as
p is not a simple list (
p[-1] can specify embedded actions (not discussed here)).
Note that the function can have any name, as long as it is preceeded by
p_error(p) rule is defined to catch syntax errors (same as
yyerror in yacc/bison).
Multiple grammar rules can be combined into a single function, which is a good idea if productions have a similar structure.
def p_binary_operators(p): '''expression : expression PLUS term | expression MINUS term term : term TIMES factor | term DIVIDE factor''' if p == '+': p = p + p elif p == '-': p = p - p elif p == '*': p = p * p elif p == '/': p = p / p
Character literals can be used instead of tokens.
def p_binary_operators(p): '''expression : expression '+' term | expression '-' term term : term '*' factor | term '/' factor''' if p == '+': p = p + p elif p == '-': p = p - p elif p == '*': p = p * p elif p == '/': p = p / p
Of course, the literals must be specified in the lexer module.
Empty productions have the form
'''symbol : '''
To explicitly set the start symbol, use
start = 'foo', where
foo is some non-terminal.
Setting precedence and associativity can be done using the precedence variable.
precedence = ( ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators ('left', 'PLUS', 'MINUS'), ('left', 'TIMES', 'DIVIDE'), ('right', 'UMINUS'), # Unary minus operator )
Tokens are ordered from lowest to highest precedence.
nonassoc means that those tokens do not associate. This means that something like
a < b < c is illegal whereas
a < b is still legal.
parser.out is a debugging file that is created when the yacc program is executed for the first time. Whenever a shift/reduce conflict occurs, the parser always shifts.