R Language Tutorial => Eliminate duplicated consecutive elements

Example

Let's say we want to eliminate duplicated subsequence element from a string (it can be more than one). For example:

2,14,14,14,19

and convert it into:

2,14,19

Using gsub, we can achieve it:

gsub("(\\d+)(,\\1)+","\\1", "2,14,14,14,19")
[1] "2,14,19"

It works also for more than one different repetition, for example:

 > gsub("(\\d+)(,\\1)+", "\\1", "2,14,14,14,19,19,20,21")
[1] "2,14,19,20,21"

Let's explain the regular expression:

(\\d+): A group 1 delimited by () and finds any digit (at least one). Remember we need to use the double backslash (\\) here because for a character variable a backslash represents special escape character for literal string delimiters (\" or \'). \d\ is equivalent to: [0-9].
,: A punctuation sign: , (we can include spaces or any other delimiter)
\\1: An identical string to the group 1, i.e.: the repeated number. If that doesn't happen, then the pattern doesn't match.

Let's try a similar situation: eliminate consecutive repeated words:

one,two,two,three,four,four,five,six

Then, just replace \d by \w, where \w matches any word character, including: any letter, digit or underscore. It is equivalent to [a-zA-Z0-9_]:

> gsub("(\\w+)(,\\1)+", "\\1", "one,two,two,three,four,four,five,six")
[1] "one,two,three,four,five,six"
>

Then, the above pattern includes as a particular case duplicated digits case.

PDF - Download R Language for free

Previous Next

R Language

Fastest Entity Framework Extensions

Example

Got any R Language Question?

R Language

R Language Modifying strings by substitution Eliminate duplicated consecutive elements

Fastest Entity Framework Extensions

Example

Got any R Language Question?