Regular Expressions What causes Backtracking?


Example

To find a match, the regex engine will consume characters one by one. When a partial match begins, the engine will remember the start position so it can go back in case the following characters don't complete the match.

  • If the match is complete, the is no backtracking
  • If the match isn't complete, the engine will backtrack the string (like when you rewind an old tape) to try to find a whole match.

For example: \d{3}[a-z]{2} against the string abc123def will be browsed as such:

abc123def
^ Does not match \d
abc123def
 ^ Does not match \d
abc123def
  ^ Does not match \d
abc123def
   ^ Does match \d (first one)
abc123def
    ^ Does match \d (second one)
abc123def
     ^ Does match \d (third one)
abc123def
      ^ Does match [a-z] (first one)
abc123def
       ^ Does match [a-z] (second one)
           MATCH FOUND

Now lets change the regex to \d{2}[a-z]{2} against the same string (abc123def):

abc123def
^ Does not match \d
abc123def
 ^ Does not match \d
abc123def
  ^ Does not match \d
abc123def
   ^ Does match \d (first one)
abc123def
    ^ Does match \d (second one)
abc123def
     ^ Does not match [a-z]
abc123def
    ^ BACKTRACK to catch \d{2} => (23)
abc123def
      ^ Does match [a-z] (first one)
abc123def
       ^ Does match [a-z] (second one)
           MATCH FOUND