R Language Regular Expressions (regex)

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Introduction

Regular expressions (also called "regex" or "regexp") define patterns that can be matched against a string. Type ?regex for the official R documentation and see the Regex Docs for more details. The most important 'gotcha' that will not be learned in the SO regex/topics is that most R-regex functions need the use of paired backslashes to escape in a pattern parameter.

Remarks

Character classes

  • "[AB]" could be A or B
  • "[[:alpha:]]" could be any letter
  • "[[:lower:]]" stands for any lower-case letter. Note that "[a-z]" is close but doesn't match, e.g., รบ.
  • "[[:upper:]]" stands for any upper-case letter. Note that "[A-Z]" is close but doesn't match, e.g., รš.
  • "[[:digit:]]" stands for any digit : 0, 1, 2, ..., or 9 and is equivalent to "[0-9]".

Quantifiers

+, * and ? apply as usual in regex. -- + matches at least once, * matches 0 or more times, and ? matches 0 or 1 time.

Start and end of line indicators

You can specify the position of the regex in the string :

  • "^..." forces the regular expression to be at the beginning of the string
  • "...$" forces the regular expression to be at the end of the string

Differences from other languages

Please note that regular expressions in R often look ever-so-slightly different from regular expressions used in other languages.

  • R requires double-backslash escapes (because "\" already implies escaping in general in R strings), so, for example, to capture whitespace in most regular expression engines, one simply needs to type \s, vs. \\s in R.

  • UTF-8 characters in R should be escaped with a capital U, e.g. [\U{1F600}] and [\U1F600] match ๐Ÿ˜€, whereas in, e.g., Ruby, this would be matched with a lower-case u.

Additional Resources

The following site reg101 is a good place for checking online regex before using it R-script.

The R Programmming wikibook has a page dedicated to text processing with many examples using regular expressions.



Got any R Language Question?