Given the following input:
aaaaaAlazyZgreeedyAlaaazyZaaaaa
We will use two patterns: one greedy: A.*Z
, and one lazy: A.*?Z
. These patterns yield the following matches:
A.*Z
yields 1 match: AlazyZgreeedyAlaaazyZ
(examples: Regex101, Rubular)A.*?Z
yields 2 matches: AlazyZ
and AlaaazyZ
(examples: Regex101, Rubular)First focus on what A.*Z
does. When it matched the first A
, the .*
, being greedy, then tries to match as many .
as possible.
aaaaaAlazyZgreeedyAlaaazyZaaaaa
\________________________/
A.* matched, Z can't match
Since the Z
doesn't match, the engine backtracks, and .*
must then match one fewer .
:
aaaaaAlazyZgreeedyAlaaazyZaaaaa
\_______________________/
A.* matched, Z can't match
This happens a few more times, until it finally comes to this:
aaaaaAlazyZgreeedyAlaaazyZaaaaa
\__________________/
A.* matched, Z can now match
Now Z
can match, so the overall pattern matches:
aaaaaAlazyZgreeedyAlaaazyZaaaaa
\___________________/
A.*Z matched
By contrast, the reluctant (lazy) repetition in A.*?Z
first matches as few .
as possible, and then taking more .
as necessary. This explains why it finds two matches in the input.
Here's a visual representation of what the two patterns matched:
aaaaaAlazyZgreeedyAlaaazyZaaaaa
\____/l \______/l l = lazy
\_________g_________/ g = greedy
Example based on answer made by polygenelubricants.
The POSIX standard does not include the ?
operator, so many POSIX regex engines do not have lazy matching. While refactoring, especially with the "greatest trick ever", may help match in some cases, the only way to have true lazy matching is to use an engine that supports it.