[R] Using regex to truncate repeating characters

Marc Schwartz marc_schwartz at me.com
Wed Nov 11 16:06:50 CET 2015


> On Nov 11, 2015, at 3:02 AM, Karl <josip.2000 at gmail.com> wrote:
> 
> Hi all,
> 
> I'm trying to learn how to use regex inside R. I'm far from an expert when
> it comes to this, but google is my friend when it comes to finding suitable
> pieces of syntax to start building from. For example, this post seems to do
> what I want:
> 
> http://stackoverflow.com/questions/12258622/regular-expression-to-check-for-repeating-characters
> However, how do I implement this in R? gsub()?
> For example, with Perl-style regex, are there syntax modifications that
> need to be done before it will work with R?
> 
> My task is that I want to truncate/limit repeated characters to 3. If I
> have the string:
> "Looooorem ipsum dolor sit ammmmmmet, consectetur adipiscing eliiiiiiiit"
> 
> I want to truncate it to:
> "Looorem ipsum dolor sit ammmet, consectetur adipiscing eliiit"
> 
> Thank you!
> 
> BR,
> Josip


Hi,

Not extensively tested, but something like this should work:

text <- "Looooorem ipsum dolor sit ammmmmmet, consectetur adipiscing eliiiiiiiit"

> gsub("([[:alnum:]])\\1{3,}", "\\1\\1\\1", text)
[1] "Looorem ipsum dolor sit ammmet, consectetur adipiscing eliiit"


The regex is looking for any alphanumeric character as a group, which is represented by:

  ([[:alnum:]])

That is followed by a backreference:

  \\1{3,}

which says find repeated characters in the prior alphanumeric character group of at least 3 repeats and return just the unique character.

The returned expression:

  \\1\\1\\1

says repeat the unique character 3 times.

See ?gsub and ?regex for some additional information.


Regards,

Marc Schwartz



More information about the R-help mailing list