Instead of using regex, the Lua string library has a special set of characters used in syntax matches. Both can be very similar, but Lua pattern matching is more limited and has a different syntax. For instance, the character sequence %a
matches any letter, while its upper-case version represents all non-letters characters, all characters classes (a character sequence that, as a pattern, can match a set of items) are listed below.
Character class | Matching section |
---|---|
%a | letters (A-Z, a-z) |
%c | control characters (\n, \t, \r, ...) |
%d | digits (0-9) |
%l | lower-case letter (a-z) |
%p | punctuation characters (!, ?, &, ...) |
%s | space characters |
%u | upper-case letters |
%w | alphanumeric characters (A-Z, a-z, 0-9) |
%x | hexadecimal digits (\3, \4, ...) |
%z | the character with representation 0 |
. | Matches any character |
As mentioned above, any upper-case version of those classes represents the complement of the class. For instance, %D
will match any non-digit character sequence:
string.match("f123", "%D") --> f
In addition to character classes, some characters have special functions as patterns:
( ) % . + - * [ ? ^ $
The character %
represents a character escape, making %?
match an interrogation and %%
match the percentage symbol. You can use the %
character with any other non-alphanumeric character, therefore, if you need to escape, for instance, a quote, you must use \\
before it, which escapes any character from a lua string.
A character set, represented inside square brackets ([]
), allows you to create a special character class, combining different classes and single characters:
local foo = "bar123bar2341"
print(foo:match "[arb]") --> b
You can get the complement of the character set by starting it with ^
:
local foo = "bar123bar2341"
print(string.match(foo, "[^bar]")) --> 1
In this example, string.match
will find the first occurrence that isn't b, a or r.
Patterns can be more useful with the help of repetition/optional modifiers, patterns in lua offer these four characters:
Character | Modifier |
---|---|
+ | One or more repetitions |
* | Zero or more repetitions |
- | Also zero or more repetitions |
? | Optional (zero or one occurrence) |
The character +
represents one or more matched characters in the sequence and it will always return the longest matched sequence:
local foo = "12345678bar123"
print(foo:match "%d+") --> 12345678
As you can see, *
is similar to +
, but it accepts zero occurrences of characters and is commonly used to match optional spaces between different patterns.
The character -
is also similar to *
, but instead of returning the longest matched sequence, it matches the shortest one.
The modifier ?
matches an optional character, allowing you to match, for example, a negative digit:
local foo = "-20"
print(foo:match "[+-]?%d+")
Lua pattern matching engine provides a few additional pattern matching items:
Character item | Description |
---|---|
%n | for n between 1 and 9 matches a substring equal to the n-th captured string |
%bxy | matches substring between two distinct characters (balanced pair of x and y ) |
%f[set] | frontier pattern: matches an empty string at any position such that the next character belongs to set and the previous character does not belong to set |