Regular expressions (also called "regex" or "regexp") define patterns that can be matched against a string. Type
?regex for the official R documentation and see the Regex Docs for more details. The most important 'gotcha' that will not be learned in the SO regex/topics is that most R-regex functions need the use of paired backslashes to escape in a
"[AB]"could be A or B
"[[:alpha:]]"could be any letter
"[[:lower:]]"stands for any lower-case letter. Note that
"[a-z]"is close but doesn't match, e.g.,
"[[:upper:]]"stands for any upper-case letter. Note that
"[A-Z]"is close but doesn't match, e.g.,
"[[:digit:]]"stands for any digit : 0, 1, 2, ..., or 9 and is equivalent to
? apply as usual in regex. --
+ matches at least once,
* matches 0 or more times, and
? matches 0 or 1 time.
You can specify the position of the regex in the string :
"^..."forces the regular expression to be at the beginning of the string
"...$"forces the regular expression to be at the end of the string
Please note that regular expressions in R often look ever-so-slightly different from regular expressions used in other languages.
R requires double-backslash escapes (because
"\" already implies escaping in general in R strings), so, for example, to capture whitespace in most regular expression engines, one simply needs to type
\\s in R.
UTF-8 characters in R should be escaped with a capital U, e.g.
[\U1F600] match 😀, whereas in, e.g., Ruby, this would be matched with a lower-case u.
The following site reg101 is a good place for checking online regex before using it R-script.
The R Programmming wikibook has a page dedicated to text processing with many examples using regular expressions.