Lua Pattern matching Lua pattern matching


Instead of using regex, the Lua string library has a special set of characters used in syntax matches. Both can be very similar, but Lua pattern matching is more limited and has a different syntax. For instance, the character sequence %a matches any letter, while its upper-case version represents all non-letters characters, all characters classes (a character sequence that, as a pattern, can match a set of items) are listed below.

Character classMatching section
%aletters (A-Z, a-z)
%ccontrol characters (\n, \t, \r, ...)
%ddigits (0-9)
%llower-case letter (a-z)
%ppunctuation characters (!, ?, &, ...)
%sspace characters
%uupper-case letters
%walphanumeric characters (A-Z, a-z, 0-9)
%xhexadecimal digits (\3, \4, ...)
%zthe character with representation 0
.Matches any character

As mentioned above, any upper-case version of those classes represents the complement of the class. For instance, %D will match any non-digit character sequence:

string.match("f123", "%D")          --> f

In addition to character classes, some characters have special functions as patterns:

( ) % . + - * [ ? ^ $

The character % represents a character escape, making %? match an interrogation and %% match the percentage symbol. You can use the % character with any other non-alphanumeric character, therefore, if you need to escape, for instance, a quote, you must use \\ before it, which escapes any character from a lua string.

A character set, represented inside square brackets ([]), allows you to create a special character class, combining different classes and single characters:

local foo = "bar123bar2341"
print(foo:match "[arb]")            --> b

You can get the complement of the character set by starting it with ^:

local foo = "bar123bar2341"
print(string.match(foo, "[^bar]"))  --> 1

In this example, string.match will find the first occurrence that isn't b, a or r.

Patterns can be more useful with the help of repetition/optional modifiers, patterns in lua offer these four characters:

+One or more repetitions
*Zero or more repetitions
-Also zero or more repetitions
?Optional (zero or one occurrence)

The character + represents one or more matched characters in the sequence and it will always return the longest matched sequence:

local foo = "12345678bar123"
print(foo:match "%d+")  --> 12345678

As you can see, * is similar to +, but it accepts zero occurrences of characters and is commonly used to match optional spaces between different patterns.

The character - is also similar to *, but instead of returning the longest matched sequence, it matches the shortest one.

The modifier ? matches an optional character, allowing you to match, for example, a negative digit:

local foo = "-20"
print(foo:match "[+-]?%d+")

Lua pattern matching engine provides a few additional pattern matching items:

Character itemDescription
%nfor n between 1 and 9 matches a substring equal to the n-th captured string
%bxymatches substring between two distinct characters (balanced pair of x and y)
%f[set]frontier pattern: matches an empty string at any position such that the next character
belongs to set and the previous character does not belong to set