Regular Expressions Escaping What characters need to be escaped?


Example

Character escaping is what allows certain characters (reserved by the regex engine for manipulating searches) to be literally searched for and found in the input string. Escaping depends on context, therefore this example does not cover string or delimiter escaping.

Backslashes

Saying that backslash is the "escape" character is a bit misleading. Backslash escapes and backslash brings; it actually toggles on or off the metacharacter vs. literal status of the character in front of it.

In order to use a literal backslash anywhere in a regex, it must be escaped by another backslash.

Escaping (outside character classes)

There are several characters that need to be escaped to be taken literally (at least outside char classes):

  • Brackets: []
  • Parentheses: ()
  • Curly braces: {}
  • Operators: *, +, ?, |
  • Anchors: ^, $
  • Others: ., \
  • In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped.
  • Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.

Escaping within Character Classes

  • It's best practice to escape square brackets ([ and ]) when they appear as literals in a char class. Under certain conditions, it's not required, depending on the flavor, but it harms readability.
  • The caret, ^, is a meta character when put as the first character in a char class: [^aeiou]. Anywhere else in the char class, it is just a literal character.
  • The dash, -, is a meta character, unless it's at the beginning or end of a character class. If the first character in the char class is a caret ^, then it will be a literal if it is the second character in the char class.

Escaping the Replacement

There are also rules for escaping within the replacement, but none of the rules above apply. The only metacharacters are $ and \, at least when $ can be used to reference capture groups (like $1 for group 1). To use a literal $, escape it: \$5.00. Likewise \: C:\\Program Files\\.


BRE Exceptions

While ERE (extended regular expressions) mirrors the typical, Perl-style syntax, BRE (basic regular expressions) has significant differences when it comes to escaping:

  • There is different shorthand syntax. All of the \d, \s, \w and so on is gone. Instead, it has its own syntax (which POSIX confusingly calls "character classes"), like [:digit:]. These constructs must be within a character class.
  • There are few metacharacters (., *, ^, $) that can be used normally. ALL of the other metacharacters must be escaped differently:

Braces {}

  • a{1,2} matches a{1,2}. To match either a or aa, use a\{1,2\}

Parentheses ()

  • (ab)\1 is invalid, since there is no capture group 1. To fix it and match abab use \(ab\)\1

Backslash

  • Inside char classes (which are called bracket expressions in POSIX), backslash is not a metacharacter (and does not need escaping). [\d] matches either \ or d.
  • Anywhere else, escape as usual.

Other

  • + and ? are literals. If the BRE engine supports them as metacharacters, they must be escaped as \? and \+.