[R] gsub: replacing double backslashes with single backslash

Daniel Nordlund djnordlund at frontier.com
Thu Mar 8 07:08:06 CET 2012


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Ista Zahn
> Sent: Wednesday, March 07, 2012 6:55 PM
> To: Greg Snow
> Cc: r-help at r-project.org; Markus Elze
> Subject: Re: [R] gsub: replacing double backslashes with single backslash
> 
> On Wed, Mar 7, 2012 at 12:57 PM, Greg Snow <538280 at gmail.com> wrote:
> >
> > The issue here is the difference between what is contained in a string
> > and what R displays to you.
> >
> > The string produced with the code:
> >
> > > tmp <- "C:\\"
> >
> > only has 3 characters (as David pointed out), the third of which is a
> > single backslash, since the 1st \ escapes the 2nd and the R string
> > parsing rules use the combination to put a sing backslash in the
> > string.  When you print the string (whether you call print directly or
> > indirectly) the print function escapes special characters, including
> > the backslash, so you see "\\" which represents a single backslash in
> > the string.  If you use the cat function instead of the print
> > function, then you will only see a single backslash (and other escape
> > sequences such as \n will also display different in print vs. cat
> > output).  There are other ways to see the exact string (write to a
> > file, use in certain command, etc.) but cat is probably the simplest.
> 
> 
> Fine, but how does this help the OP (and me!) figure out how to
> replace "C:\\" with "C:\" ?
> 
> Best,
> Ista

Ista,

you have received some good descriptions / explanations of what is going on, but you don't seem to have digested it yet.  I don't blame you, I found this difficult myself when I first encountered this.  One needs to keep distinct what is actually contained in a string, and how R chooses to display it under various circumstances.  Consider the example again

>tmp <- "C:\\"

the variable tmp contains only three characters: 1. a capital C, 2. a colon, and 3. a single backslash.  You can tell it only has three characters like this

> nchar(tmp)
[1] 3

If you use cat() to display the contents you will also see that it only has three characters (I included the newline character to force a newline; print() does it automatically, but cat() doesn't)

> cat(tmp, '\n')
C:\ 

So again we see just three characters.  However, if we display the variable with print, we will see two backslashes even though there is actually only one backslash in the variable.

> print(tmp)
[1] "C:\\"

So when you ask, 'Fine, but how does this help the OP (and me!) figure out how to replace "C:\\" with "C:\"?', you need to be clear about whether you are talking about a string which displays with two backslashes, or a string that actually has two consecutive backslashes, which print() will display as four consecutive backslashes.  If you are talking about a variable, tmp, that actually has two backslashes in it, then it will display like this

> tmp
[1] "C:\\\\"
> print(tmp)
[1] "C:\\\\"
> cat(tmp,'\n')
C:\\

If you then want to change it so that it has only 1 backslash in it, you could do 

> tmp <- sub('\\\\', '\\', tmp)
> tmp
[1] "C:\\"
> print(tmp)
[1] "C:\\"
> cat(tmp,'\n')
C:\ 


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
 



More information about the R-help mailing list