Regular Expressions Regex Pitfalls Why does a regex skip some closing brackets/parentheses and match them afterwards?


Consider this example:

He went into the cafe "Dostoevski" and said: "Good evening."

Here we have two sets of quotes. Let's assume we want to match both, so that our regex matches at "Dostoevski" and "Good evening."

At first, you could be tempted to keep it simple:

".*"  # matches a quote, then any characters until the next quote

But it doesn't work: it matches from the first quote in "Dostoevski" and until the closing quote in "Good evening.", including the and said: part. Regex101 demo

Why did it happen?

This happens because the regex engine, when it encounters .*, "eats up" all of the input to the very end. Then, it needs to match the final ". So, it "backs off" from the end of the match, letting go of the matched text until the first " is found - and it is, of course, the last " in the match, at the end of "Good evening." part.

How to prevent this and match exactly to the first quotes?

Use [^"]*. It doesn't eat all the input - only until the first ", just as needed. Regex101 demo