Regular Expressions Back reference Basics


Back references are used to match the same text previously matched by a capturing group. This both helps in reusing previous parts of your pattern and in ensuring two pieces of a string match.

For example, if you are trying to verify that a string has a digit from zero to nine, a separator, such as hyphens, slashes, or even spaces, a lowercase letter, another separator, then another digit from zero to nine, you could use a regex like this:

[0-9][-/ ][a-z][-/ ][0-9]

This would match 1-a-4, but it would also match 1-a/4 or 1 a-4. If we want the separators to match, we can use a capture group and a back reference. The back reference will look at the match found in the indicated capture group, and ensure that the location of the back reference matches exactly.

Using our same example, the regex would become:

[0-9]([-/ ])[a-z]\1[0-9]

The \1 denotes the first capture group in the pattern. With this small change, the regex now matches 1-a-4 or 1 a 4 but not 1 a-4 or 1-a/4.

The number to use for your back reference depends on the location of your capture group. The number can be from one to nine and can be found by counting your capture groups.

([0-9])([-/ ])[a-z][-/ ]([0-9])
|--1--||--2--|          |--3--|

Nested capture groups change this count slightly. You first count the exterior capture group, then the next level, and continue until you leave the nest:

(([0-9])([-/ ]))([a-z])