Regular Expressions Greediness versus Laziness


Given the following input:


We will use two patterns: one greedy: A.*Z, and one lazy: A.*?Z. These patterns yield the following matches:

First focus on what A.*Z does. When it matched the first A, the .*, being greedy, then tries to match as many . as possible.

      A.* matched, Z can't match

Since the Z doesn't match, the engine backtracks, and .* must then match one fewer .:

      A.* matched, Z can't match

This happens a few more times, until it finally comes to this:

      A.* matched, Z can now match

Now Z can match, so the overall pattern matches:

      A.*Z matched

By contrast, the reluctant (lazy) repetition in A.*?Z first matches as few . as possible, and then taking more . as necessary. This explains why it finds two matches in the input.

Here's a visual representation of what the two patterns matched:

     \____/l      \______/l      l = lazy
     \_________g_________/       g = greedy

Example based on answer made by polygenelubricants.

The POSIX standard does not include the ? operator, so many POSIX regex engines do not have lazy matching. While refactoring, especially with the "greatest trick ever", may help match in some cases, the only way to have true lazy matching is to use an engine that supports it.