Regular Expressions DFA


Example

A DFA (Deterministic Finite Automaton) engine is driven by the input.

Principle

The algorithm scans through the input string once, and remembers all possible paths in the regex which could match. For instance, when an alternation is encountered in the pattern, two new paths are created and attempted independently. When a given path does not match, it is dropped from the possibilities set.

Implications

The matching time is bounded by the input string size. There is no backtracking, and the engine can find multiple matches simultaneously, even overlapping matches.

The main drawback of this method is the reduced feature set which can be supported by the engine, compared to the NFA engine type.

Example

Match a(b|c)a against abadaca:

abadaca      a(b|c)a
^            ^        Attempt 1      ==> CONTINUE

abadaca      a(b|c)a
 ^           ^        Attempt 2      ==> FAIL
               ^      Attempt 1.1    ==> CONTINUE
                 ^    Attempt 1.2    ==> FAIL

abadaca      a(b|c)a
  ^          ^        Attempt 3      ==> CONTINUE
                   ^  Attempt 1.1    ==> MATCH

abadaca      a(b|c)a
   ^         ^        Attempt 4      ==> FAIL
               ^      Attempt 3.1    ==> FAIL
                 ^    Attempt 3.2    ==> FAIL

abadaca      a(b|c)a
    ^        ^        Attempt 5      ==> CONTINUE

abadaca      a(b|c)a
     ^       ^        Attempt 6      ==> FAIL
               ^      Attempt 5.1    ==> FAIL
                 ^    Attempt 5.2    ==> CONTINUE

abadaca      a(b|c)a
      ^      ^        Attempt 7      ==> CONTINUE
                   ^  Attempt 5.2    ==> MATCH

abadaca      a(b|c)a
       ^       ^      Attempt 7.1    ==> FAIL
                 ^    Attempt 7.2    ==> FAIL