Often you want to match an expression only in specific places (leaving them untouched in others, that is). Consider the following sentence:
An apple a day keeps the doctor away (I eat an apple everyday).
Here the "apple" occurs twice which can be solved with so called backtracking control verbs which are supported by the newer regex
module. The idea is:
forget_this | or this | and this as well | (but keep this)
With our apple example, this would be:
import regex as re
string = "An apple a day keeps the doctor away (I eat an apple everyday)."
rx = re.compile(r'''
\([^()]*\) (*SKIP)(*FAIL) # match anything in parentheses and "throw it away"
| # or
apple # match an apple
''', re.VERBOSE)
apples = rx.findall(string)
print(apples)
# only one
This matches "apple" only when it can be found outside of the parentheses.
(*SKIP)
acts as an "always-true-assertion". Afterwards, it correctly fails on (*FAIL)
and backtracks.(*SKIP)
from right to left (aka while backtracking) where it is forbidden to go any further to the left. Instead, the engine is told to throw away anything to the left and jump to the point where the (*SKIP)
was invoked.