Instead of using regex, the Lua string library has a special set of characters used in syntax matches. Both can be very similar, but Lua pattern matching is more limited and has a different syntax. For instance, the character sequence %a matches any letter, while its upper-case version represents all non-letters characters, all characters classes (a character sequence that, as a pattern, can match a set of items) are listed below.
| Character class | Matching section |
|---|---|
| %a | letters (A-Z, a-z) |
| %c | control characters (\n, \t, \r, ...) |
| %d | digits (0-9) |
| %l | lower-case letter (a-z) |
| %p | punctuation characters (!, ?, &, ...) |
| %s | space characters |
| %u | upper-case letters |
| %w | alphanumeric characters (A-Z, a-z, 0-9) |
| %x | hexadecimal digits (\3, \4, ...) |
| %z | the character with representation 0 |
| . | Matches any character |
As mentioned above, any upper-case version of those classes represents the complement of the class. For instance, %D will match any non-digit character sequence:
string.match("f123", "%D") --> f
In addition to character classes, some characters have special functions as patterns:
( ) % . + - * [ ? ^ $
The character % represents a character escape, making %? match an interrogation and %% match the percentage symbol. You can use the % character with any other non-alphanumeric character, therefore, if you need to escape, for instance, a quote, you must use \\ before it, which escapes any character from a lua string.
A character set, represented inside square brackets ([]), allows you to create a special character class, combining different classes and single characters:
local foo = "bar123bar2341"
print(foo:match "[arb]") --> b
You can get the complement of the character set by starting it with ^:
local foo = "bar123bar2341"
print(string.match(foo, "[^bar]")) --> 1
In this example, string.match will find the first occurrence that isn't b, a or r.
Patterns can be more useful with the help of repetition/optional modifiers, patterns in lua offer these four characters:
| Character | Modifier |
|---|---|
| + | One or more repetitions |
| * | Zero or more repetitions |
| - | Also zero or more repetitions |
| ? | Optional (zero or one occurrence) |
The character + represents one or more matched characters in the sequence and it will always return the longest matched sequence:
local foo = "12345678bar123"
print(foo:match "%d+") --> 12345678
As you can see, * is similar to +, but it accepts zero occurrences of characters and is commonly used to match optional spaces between different patterns.
The character - is also similar to *, but instead of returning the longest matched sequence, it matches the shortest one.
The modifier ? matches an optional character, allowing you to match, for example, a negative digit:
local foo = "-20"
print(foo:match "[+-]?%d+")
Lua pattern matching engine provides a few additional pattern matching items:
| Character item | Description |
|---|---|
%n | for n between 1 and 9 matches a substring equal to the n-th captured string |
%bxy | matches substring between two distinct characters (balanced pair of x and y) |
%f[set] | frontier pattern: matches an empty string at any position such that the next character belongs to set and the previous character does not belong to set |