Several lexer rules can match the same input text. In that case, the token type will be chosen as follows:
'{'
), use the implicit ruleThe following combined grammar:
grammar LexerPriorityRulesExample;
// Parser rules
randomParserRule: 'foo'; // Implicitly declared token type
// Lexer rules
BAR: 'bar';
IDENTIFIER: [A-Za-z]+;
BAZ: 'baz';
WS: [ \t\r\n]+ -> skip;
Given the following input:
aaa foo bar baz barz
Will produce the following token sequence from the lexer:
IDENTIFIER 'foo' BAR IDENTIFIER IDENTIFIER
aaa
is of type IDENTIFIER
Only the IDENTIFIER
rule can match this token, there is no ambiguity.
foo
is of type 'foo'
The parser rule randomParserRule
introduces the implicit 'foo'
token type, which is prioritary over the IDENTIFIER
rule.
bar
is of type BAR
This text matches the BAR
rule, which is defined before the IDENTIFIER
rule, and therefore has precedence.
baz
is of type IDENTIFIER
This text matches the BAZ
rule, but it also matches the IDENTIFIER
rule. The latter is chosen as it is defined before BAR
.
Given the grammar, BAZ
will never be able to match, as the IDENTIFIER
rule already covers everything BAZ
can match.
barz
is of type IDENTIFIER
The BAR
rule can match the first 3 characters of this string (bar
), but the IDENTIFIER
rule will match 4 characters. As IDENTIFIER
matches a longer substring, it is chosen over BAR
.
As a rule of thumb, specific rules should de defined before more generic rules. If a rule can only match an input which is already covered by a previously defined rule, it will never be used.
Implicitly defined rules such as 'foo'
act as if they were defined before all other lexer rules.