This is a simple parser which will parse an integer variable declaration token stream which we created in the previous example Simple Lexical Analyser. This parser will also be coded in python.
The parser is the process in which the source text is converted to an abstract syntax tree (AST). It is also in charge of performing semantical validation which is weeding out syntactically correct statements that make no sense, e.g. unreachable code or duplicate declarations.
Example tokens:
[['DATATYPE', 'int'], ['IDENTIFIER', 'result'], ['OPERATOR', '='], ['INTEGER', '100'], ['END_STATEMENT', ';']]
Code for parser in 'python3':
ast = { 'VariableDecleration': [] }
tokens = [ ['DATATYPE', 'int'], ['IDENTIFIER', 'result'], ['OPERATOR', '='],
['INTEGER', '100'], ['END_STATEMENT', ';'] ]
# Loop through the tokens and form ast
for x in range(0, len(tokens)):
# Create variable for type and value for readability
token_type = tokens[x][0]
token_value = tokens[x][1]
# This will check for the end statement which means the end of var decl
if token_type == 'END_STATEMENT': break
# This will check for the datatype which should be at the first token
if x == 0 and token_type == 'DATATYPE':
ast['VariableDecleration'].append( {'type': token_value} )
# This will check for the name which should be at the second token
if x == 1 and token_type == 'IDENTIFIER':
ast['VariableDecleration'].append( {'name': token_value} )
# This will check to make sure the equals operator is there
if x == 2 and token_value == '=': pass
# This will check for the value which should be at the third token
if x == 3 and token_type == 'INTEGER' or token_type == 'STRING':
ast['VariableDecleration'].append( {'value': token_value} )
print(ast)
The following piece of code should output this as a result:
{'VariableDecleration': [{'type': 'int'}, {'name': 'result'}, {'value': '100'}]}
As you can see all that the parser does is from the source code tokens finds a pattern for the variable declaration (in this case) and creates an object with it which holds its properties like type
, name
and value
.
We created the ast
variable which will hold the complete AST.
We created the examples token
variable which holds the tokens that were created by our lexer which now needs to be parsed.
Next, we loop through each token and perform some checks to find certain tokens and form our AST with them.
We create variable for type and value for readability
We now perform checks like this one:
if x == 0 and token_type == 'DATATYPE':
ast['VariableDecleration'].append( {'type': token_value} )
which looks for a datatype and adds it to the AST. We keep doing this for the value and name which will then result in a full VariableDecleration
AST.
If you want to interact with this code and play with it here is a link to the code in an online compiler https://repl.it/J9IT/latest