[R] grep and gsub on backslash and quotes

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Aug 12 18:21:40 CEST 2003

"Simon Fear" <Simon.Fear at synequanon.com> writes:

> The following code works,  to gsub single quotes to double quotes:
> line <- gsub("'", '"', line)
> (that's a single quote within doubles then a double within singles if
> your
> viewer's font is not good).
> But The R Language Manual tells me that
> Quotes and other special characters within strings
> are specified using escape sequences:
> \' single quote
> \" double quote
> so why is the following wrong: gsub("\\\\'", "\\\\"", line)? That or any
> other number of backslashes (have tried all up to n=6 just for good
> measure).

There's a backslash missing in the replacement. This works:

line <- "ab\\\'cd"
gsub("\\\\'", "\\\\\"", line)

and will replace \' with  \"
> BTW is it documented anywhere that you need four backslashes in an RE to
> match one in the target, when it is being passed as an argument to gsub
> or
> grep? How would I know how many levels of doubling up to use for any
> other
> functions? (I got to 4 consecutive \ by trial and error in this case,
> but
> have a dim memory of having read about it somewhere.)

There are two levels because backslashes are escape characters both to
R strings and regular expressions. So in the above, "line" is 


and the match pattern is 

\\' which matches \' 

and the replacement is

\\" which becomes \"

More interesting is

> gsub("\\'", "a", line)
[1] "ab\\'cda"
> gsub("\\'", "a", line, perl=T)
[1] "ab\\acd"

so \' matches a single quote with PCRE but not with ordinary RE. (Yes,
there's a reason...)

   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

More information about the R-help mailing list