Let's say we want to eliminate duplicated subsequence element from a string (it can be more than one). For example:
and convert it into:
gsub, we can achieve it:
gsub("(\\d+)(,\\1)+","\\1", "2,14,14,14,19")  "2,14,19"
It works also for more than one different repetition, for example:
> gsub("(\\d+)(,\\1)+", "\\1", "2,14,14,14,19,19,20,21")  "2,14,19,20,21"
Let's explain the regular expression:
(\\d+): A group 1 delimited by () and finds any digit (at least one). Remember we need to use the double backslash (
\\) here because for a character variable a backslash represents special escape character for literal string delimiters (
\d\is equivalent to:
,: A punctuation sign:
,(we can include spaces or any other delimiter)
\\1: An identical string to the group 1, i.e.: the repeated number. If that doesn't happen, then the pattern doesn't match.
Let's try a similar situation: eliminate consecutive repeated words:
Then, just replace
\w matches any word character, including:
any letter, digit or underscore. It is equivalent to
> gsub("(\\w+)(,\\1)+", "\\1", "one,two,two,three,four,four,five,six")  "one,two,three,four,five,six" >
Then, the above pattern includes as a particular case duplicated digits case.