Regular Expressions Greedy and Lazy quantifiers Greediness versus Laziness

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

Given the following input:

aaaaaAlazyZgreeedyAlaaazyZaaaaa

We will use two patterns: one greedy: A.*Z, and one lazy: A.*?Z. These patterns yield the following matches:

First focus on what A.*Z does. When it matched the first A, the .*, being greedy, then tries to match as many . as possible.

aaaaaAlazyZgreeedyAlaaazyZaaaaa
     \________________________/
      A.* matched, Z can't match

Since the Z doesn't match, the engine backtracks, and .* must then match one fewer .:

aaaaaAlazyZgreeedyAlaaazyZaaaaa
     \_______________________/
      A.* matched, Z can't match

This happens a few more times, until it finally comes to this:

aaaaaAlazyZgreeedyAlaaazyZaaaaa
     \__________________/
      A.* matched, Z can now match

Now Z can match, so the overall pattern matches:

aaaaaAlazyZgreeedyAlaaazyZaaaaa
     \___________________/
      A.*Z matched

By contrast, the reluctant (lazy) repetition in A.*?Z first matches as few . as possible, and then taking more . as necessary. This explains why it finds two matches in the input.

Here's a visual representation of what the two patterns matched:

aaaaaAlazyZgreeedyAlaaazyZaaaaa
     \____/l      \______/l      l = lazy
     \_________g_________/       g = greedy

Example based on answer made by polygenelubricants.

The POSIX standard does not include the ? operator, so many POSIX regex engines do not have lazy matching. While refactoring, especially with the "greatest trick ever", may help match in some cases, the only way to have true lazy matching is to use an engine that supports it.



Got any Regular Expressions Question?