Tutorial by Examples | RIP Tutorial

Simple rules

Lexer rules define token types. Their name has to start with an uppercase letter to distinguish them from parser rules. INTEGER: [0-9]+; IDENTIFIER: [a-zA-Z_] [a-zA-Z_0-9]*; OPEN_PAREN: '('; CLOSE_PAREN: ')'; Basic syntax: SyntaxMeaningAMatch lexer rule or fragment named AA BMatch A follow...

ANTLR • Lexer rules in v4

Fragments

Fragments are reusable parts of lexer rules which cannot match on their own - they need to be referenced from a lexer rule. INTEGER: DIGIT+ | '0' [Xx] HEX_DIGIT+ ; fragment DIGIT: [0-9]; fragment HEX_DIGIT: [0-9A-Fa-f];

ANTLR • Lexer rules in v4

Implicit lexer rules

When tokens like '{' are used in a parser rule, an implicit lexer rule will be created for them unless an explicit rule exists. In other words, if you have a lexer rule: OPEN_BRACE: '{'; Then both of these parser rules are equivalent: parserRule: '{'; parserRule: OPEN_BRACE; But if the OPE...

ANTLR • Lexer rules in v4

Priority rules

Several lexer rules can match the same input text. In that case, the token type will be chosen as follows: First, select the lexer rule which matches the longest input If the text matches an implicitly defined token (like '{'), use the implicit rule If several lexer rules match the same input l...

ANTLR • Lexer rules in v4

Lexer commands

A lexer rule can have associated commands: WHITESPACE: [ \r\n] -> skip; Commands are defined after a -> at the end of the rule. skip: Skips the matched text, no token will be emited channel(n): Emits the token on a different channel type(n): Changes the emitted token type mode(n), pu...

ANTLR • Lexer rules in v4

Actions and semantic predicates

A lexer action is a block of arbitrary code in the target language surrounded by {...}, which is executed during matching: IDENTIFIER: [A-Z]+ { log("matched rule"); }; A semantic predicate is a block of arbitrary code in the target language surrounded by {...}?, which evaluates to a bo...

ANTLR • Lexer rules in v4