[R] Why do my regular expressions require a double escape \\ to get a literal??

Berend Hasselman bhh at xs4all.nl
Fri Mar 2 11:00:53 CET 2012


On 02-03-2012, at 09:36, Roey Angel wrote:

> Hi,
> I was recently misfortunate enough to have to use regular expressions to sort out some data in R.
> I'm working on a data file which contains taxonomical data of bacteria in hierarchical order.
> A sample of this file can be generated using:
> 
> tax.data <- read.table(header=F, con <- textConnection('
> G9SS7BA01D15EC  Bacteria(100)    Cyanobacteria(84)    unclassified
> G9SS7BA01C9UIR    Bacteria(100)    Proteobacteria(94)    Alphaproteobacteria(89)
> G9SS7BA01CM00D    Bacteria(100)    Proteobacteria(99)    Alphaproteobacteria(99)
> '))
> close(con)
> 
> What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point)
> I assumed that the following command would solve it, but instead I got an error.
> 
> tax.data <- as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x)))
> Error: '\(' is an unrecognized escape in character string starting "\("
> 
> And it doesn't matter if I use perl = TRUE or not.
> To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis:
> 
> tax.data <- as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x)))
> 
> This yields the desired result but I wonder why it does that?
> No other regular expression system I'm used to (e.g. Perl, Shell) works like that.
> 
> I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP.
> 
> I'd appreciate any explanation.

Section "Character vectors" in the R Intro manual.

?Quotes

The regular expression is provided as a string to gsub. In strings there are escape sequences.
To get the \ as a single \ to the regular expression parser it has to be \-ed in the string stage: \\

Berend



More information about the R-help mailing list