[R] gsub issue in R 2.11.1, but not present in 2.9.2

Bert Gunter gunter.berton at gene.com
Tue Jun 29 20:07:32 CEST 2010


Jason:

I think it's actually even a bit worse than what Duncan said, which was:

-----------
"You need to double the backslashes to enter them in an R string.  So

gsub("N\\A", "NA", original, fixed=TRUE)

should work if original contains a single backslash, and

gsub("N\\\\A", "NA", original, fixed=TRUE)

should work if it contains a double one.  Two things add to the confusion
here:  First, a single backslash will be displayed doubled by print(). .. "
------

Well, let's see: (On R version 2.11.1, 2010-5-31 for Windows)

> astring <- "n\a"
> print(astring)
[1] "n\a"

So Duncan's last sentence appears to be incorrect. The "\" is not displayed
doubled. However ...

> bstring <- "N\A"
Error: '\A' is an unrecognized escape in character string starting "N\A"

What's going on? Well, the "\a" in astring is a _single escape sequence (for
a beep/bell sound, on Windows anyway: cat("\a") should make a sound). So the
"\" in "\a" is printed as correctly undoubled. However, since the "\A" in
bstring does _not_ correspond to any escape sequence, the expression "\A"
cannot be parsed and an error is thrown. But:

> bstring <- "N\\A"
> print(bstring)
[1] "N\\A"   ## is fine

## ... Noting that  

> nchar("\\A")
[1] 2

So whether a "\" needs to be doubled or not depends on whether the parser
can interpret it as part of a legitimate escape sequence, whence

gsub("\a","","\a") ## works but
gsub("\A","","\A") ## does not.

To avoid such confusion, I think Duncan's advice to double backslashes
should be heeded as much as possible. Unfortunately, I don't think it's
always possible:

> newlineString <- "first line\nsecond line\n"
> print(newlineString)
[1] "first line\nsecond line\n"
> cat(newlineString)
first line
second line

Cheers,
Bert


Bert Gunter
Genentech Nonclinical Statistics


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Uwe Ligges
> Sent: Tuesday, June 29, 2010 4:11 AM
> To: Jason Rupert
> Cc: r-help at r-project.org
> Subject: Re: [R] gsub issue in R 2.11.1, but not present in 2.9.2
> 
> 
> 
> On 29.06.2010 12:47, Jason Rupert wrote:
> > Previously in R 2.9.2 I used the following to convert from an improperly
> formatted NA string into one that is a bit more consistent.
> >
> >
> > gsub("N\A", "NA", "N\A", fixed=TRUE)
> >
> > This worked in R 2.9.2, but now in R 2.11.1 it doesn't seem to work an
> throws the following error.
> > Error: '\A' is an unrecognized escape in character string starting "N\A"
> >
> > I guess my questions are the following:
> > (1) Is this expected behavior?
> > (2) If it is expected behavior, what is the proper way to replace "N\A"
> with "NA" and "N\\A" with "NA"?
> 
> 
> If your original text "thestring" contains "N\A", then the R
> representation is "N\\A", and hence
> 
> gsub("N\\A", "NA", thestring)
> 
> If you want to try explicitly, you need to write
> 
> gsub("N\\A", "NA", "N\\A")
> 
> If you original text contains two backslashes, both have to be escaped as
> in
> 
> gsub("N\\\\A", "NA", thestring)
> 
> Uwe Ligges
> 
> 
> > Thank you again for all the help and insight.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list