Tutorial by Examples | RIP Tutorial

Eliminating Whitespace

string <- ' some text on line one; and then some text on line two ' Trimming Whitespace "Trimming" whitespace typically refers to removing both leading and trailing whitespace from a string. This may be done using a combination of the previous examples. gsub is used to for...

R Language • Regular Expressions (regex)

Validate a date in a "YYYYMMDD" format

It is a common practice to name files using the date as prefix in the following format: YYYYMMDD, for example: 20170101_results.csv. A date in such string format can be verified using the following regular expression: \\d{4}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01]) The above expression considers d...

R Language • Regular Expressions (regex)

Validate US States postal abbreviations

The following regex includes 50 states and also Commonwealth/Territory (see www.50states.com): regex <- "(A[LKSZR])|(C[AOT])|(D[EC])|(F[ML])|(G[AU])|(HI)|(I[DLNA])|(K[SY])|(LA)|(M[EHDAINSOT])|(N[EVHJMYCD])|(MP)|(O[HKR])|(P[WAR])|(RI)|(S[CD])|(T[NX])|(UT)|(V[TIA])|(W[AVIY])" For exampl...

R Language • Regular Expressions (regex)

Validate US phone numbers

The following regular expression: us.phones.regex <- "^\\s*(\\+\\s*1(-?|\\s+))*[0-9]{3}\\s*-?\\s*[0-9]{3}\\s*-?\\s*[0-9]{4}$" Validates a phone number in the form of: +1-xxx-xxx-xxxx, including optional leading/trailing blanks at the beginning/end of each group of numbers, but not ...

R Language • Regular Expressions (regex)

Escaping characters in R regex patterns

Since both R and regex share the escape character ,"\", building correct patterns for grep, sub, gsub or any other function that accepts a pattern argument will often need pairing of backslashes. If you build a three item character vector in which one items has a linefeed, another a tab ch...

R Language • Regular Expressions (regex)

Differences between Perl and POSIX regex

There are two ever-so-slightly different engines of regular expressions implemented in R. The default is called POSIX-consistent; all regex functions in R are also equipped with an option to turn on the latter type: perl = TRUE. Look-ahead/look-behind perl = TRUE enables look-ahead and look-behind...

R Language • Regular Expressions (regex)