compiler-construction Basics of Compiler Construction Simple Parser

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Insert
> Step 2: And Like the video. BONUS: You can also share it!

Example

This is a simple parser which will parse an integer variable declaration token stream which we created in the previous example Simple Lexical Analyser. This parser will also be coded in python.

What is a parser?

The parser is the process in which the source text is converted to an abstract syntax tree (AST). It is also in charge of performing semantical validation which is weeding out syntactically correct statements that make no sense, e.g. unreachable code or duplicate declarations.


Example tokens:

[['DATATYPE', 'int'], ['IDENTIFIER', 'result'], ['OPERATOR', '='], ['INTEGER', '100'], ['END_STATEMENT', ';']]

Code for parser in 'python3':

ast = { 'VariableDecleration': [] }

tokens = [ ['DATATYPE', 'int'], ['IDENTIFIER', 'result'], ['OPERATOR', '='],  
           ['INTEGER', '100'], ['END_STATEMENT', ';'] ]
           
# Loop through the tokens and form ast
for x in range(0, len(tokens)):
    
    # Create variable for type and value for readability
    token_type  = tokens[x][0]
    token_value = tokens[x][1]
    
    # This will check for the end statement which means the end of var decl
    if token_type == 'END_STATEMENT': break
    
    # This will check for the datatype which should be at the first token
    if x == 0 and token_type == 'DATATYPE':
        ast['VariableDecleration'].append( {'type': token_value} )
    
    # This will check for the name which should be at the second token
    if x == 1 and token_type == 'IDENTIFIER':
        ast['VariableDecleration'].append( {'name': token_value} )
        
    # This will check to make sure the equals operator is there
    if x == 2 and token_value == '=': pass
    
    # This will check for the value which should be at the third token    
    if x == 3 and token_type == 'INTEGER' or token_type == 'STRING':
        ast['VariableDecleration'].append( {'value': token_value} )
        
print(ast)

The following piece of code should output this as a result:

{'VariableDecleration': [{'type': 'int'}, {'name': 'result'}, {'value': '100'}]}

As you can see all that the parser does is from the source code tokens finds a pattern for the variable declaration (in this case) and creates an object with it which holds its properties like type, name and value.


Let's break it down

  1. We created the ast variable which will hold the complete AST.

  2. We created the examples token variable which holds the tokens that were created by our lexer which now needs to be parsed.

  3. Next, we loop through each token and perform some checks to find certain tokens and form our AST with them.

  4. We create variable for type and value for readability

  5. We now perform checks like this one:

    if x == 0 and token_type == 'DATATYPE':
         ast['VariableDecleration'].append( {'type': token_value} )
    

    which looks for a datatype and adds it to the AST. We keep doing this for the value and name which will then result in a full VariableDecleration AST.

If you want to interact with this code and play with it here is a link to the code in an online compiler https://repl.it/J9IT/latest



Got any compiler-construction Question?