Regular Expressions Greedy and Lazy quantifiers Boundaries with multiple matches


When you have an input with well defined boundaries and are expecting more than one match in your string, you have two options:

  • Using lazy quantifiers;
  • Using a negated character class.

Consider the following:

You have a simple templating engine, you want to replace substrings like $[foo] where foo can be any string. You want to replace this substring with whatever based on the part between the [].

You can try something like \$\[(.*)\], and then use the first capture group.

The problem with this is if you have a string like something $[foo] lalala $[bar] something else your match will be

something $[foo] lalala $[bar] something else
          | \______CG1______/|

The capture group being foo] lalala $[bar which may or may not be valid.

You have two solutions

  1. Using laziness: In this case making * lazy is one way to go about finding the right things. So you change your expression to \$\[(.*?)\]

  2. Using negated character class : [^\]] you change your expression to \$\[([^\]]*)\].

In both solutions, the result will be the same:

something $[foo] lalala $[bar] something else
          | \_/|        | \_/|
          \____/        \____/

With the capture group being respectively foo and bar.

Using negated character class reduces backtracking issue and may save your CPU a lot of time when it comes to large inputs.