The simplest way to write a parser is to use the recursive descent technique. This creates a top-down parser (which may formally be described a LL(1)). To start the example we first have to establish the grammar rules for our language. In this example we will use simple arithmetic expression assignments for expressions that can only use the plus operator:
Assignment --> Identifier := Expression
Expression --> Expression + Term | Term
Term --> Identifier | (Expression)
Identifier --> x | y | z
For each rule of the grammar we can write a procedure to recognise the incoming tokens to the rule. For the purposes of this example we can assume a routine called NextToken
which invokes a lexical analyser to supply the token, and a routine called error
which is used to output an error message. The language used is based on Pascal.
var token:char; (* Updated by NextToken *)
procedure identifier;
begin
if token in ['x','y','z']
then
NextToken
else
error('Identifier expected')
end;
You can see that this code implements the rule for recognising an Identifier
. We can then implement the rule for a Term
similarly:
procedure expression forward;
procedure term;
begin
if token = '('
then
begin
nextToken;
expression;
if token <> ')'
then
error(') expected')
else NextToken
end
else
identifier
end;
When we come to implement the rule for Expression
we have a problem; the first element of the Expression
rule is itself an Expression
. This would cause us to write the following:
procedure expression;
begin
expression;
...
end;
This is directly self-recursive and thus would loop forever. Grammar parsed by top-down algorithms cannot be left-recursive for this reason. An easy way out of this problem is to recast the recursion as iteration in the following way:
Expression --> Term { + Term}*
We can now code up the grammar rule as:
procedure expression;
begin
term;
while token = '+'
do
begin
NextTerm;
term
end
end;
We can now complete the parser with the remaining rule for the assignment:
procedure assignment;
begin
identifier;
if token <> '='
then
error('= expected')
else
begin
NextToken;
expression;
end
end;