[R] gsub: replacing double backslashes with single backslash

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Thu Mar 8 04:53:09 CET 2012


You are chasing your tail. You have already achieved your goal, but you don't seem to understand that.

The three characters

C:\

are represented in R as

"C:\\"

so when you see the latter, the former is what is actually already in memory.

"C:\"

is not legal R code (it is an unterminated string). It is perfectly possible to have those characters in memory, but they will not be displayed that way by the print function.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Ista Zahn <istazahn at gmail.com> wrote:

>On Wed, Mar 7, 2012 at 12:57 PM, Greg Snow <538280 at gmail.com> wrote:
>>
>> The issue here is the difference between what is contained in a
>string
>> and what R displays to you.
>>
>> The string produced with the code:
>>
>> > tmp <- "C:\\"
>>
>> only has 3 characters (as David pointed out), the third of which is a
>> single backslash, since the 1st \ escapes the 2nd and the R string
>> parsing rules use the combination to put a sing backslash in the
>> string.  When you print the string (whether you call print directly
>or
>> indirectly) the print function escapes special characters, including
>> the backslash, so you see "\\" which represents a single backslash in
>> the string.  If you use the cat function instead of the print
>> function, then you will only see a single backslash (and other escape
>> sequences such as \n will also display different in print vs. cat
>> output).  There are other ways to see the exact string (write to a
>> file, use in certain command, etc.) but cat is probably the simplest.
>
>
>Fine, but how does this help the OP (and me!) figure out how to
>replace "C:\\" with "C:\" ?
>
>Best,
>Ista
>>
>>
>> On Wed, Mar 7, 2012 at 7:57 AM, David Winsemius
><dwinsemius at comcast.net> wrote:
>> >
>> > On Mar 7, 2012, at 6:54 AM, Markus Elze wrote:
>> >
>> >> Hello everybody,
>> >> this might be a trivial question, but I have been unable to find
>this
>> >> using Google. I am trying to replace double backslashes with
>single
>> >> backslashes using gsub.
>> >
>> >
>> > Actually you don't have double backslashes in the argument you are
>> > presenting to gsub. The string entered at the console as "C:\\"
>only has a
>> > single backslash.
>> >
>> >> nchar("C:\\")
>> > [1] 3
>> >
>> >
>> >> There seems to be some unexpected behaviour with regards to the
>> >> replacement string "\\". The following example uses the string
>C:\\ which
>> >> should be converted to C:\ .
>> >>
>> >> > gsub("\\\\", "\\", "C:\\")
>> >> [1] "C:"
>> >
>> >
>> > But I do not understand that returned value, either. I thought that
>the
>> > 'repl' argument (which I think I have demonstrated is a single
>backslash)
>> > would get put back in the returned value.
>> >
>> >
>> >
>> >> > gsub("\\\\", "Test", "C:\\")
>> >> [1] "C:Test"
>> >> > gsub("\\\\", "\\\\", "C:\\")
>> >> [1] "C:\\"
>> >
>> >
>> > I thought the parsing rules for 'replacement' were different than
>the rules
>> > for 'patt'. So I'm puzzled, too. Maybe something changed in 2.14?
>> >
>> >> sub("\\\\", "\\", "C:\\", fixed=TRUE)
>> > [1] "C:\\"
>> >
>> >> sub("\\\\", "\\", "C:\\")
>> > [1] "C:"
>> >> sub("([\\])", "\\1", "C:\\")
>> > [1] "C:\\"
>> >
>> > The NEWS file does say that there is a new regular expression
>implementation
>> > and that the help file for regex should be consulted.
>> >
>> > And presumably we should study this:
>> >
>> > http://laurikari.net/tre/documentation/regex-syntax/
>> >
>> >  In the 'replacement' argument, the "\\" is used to back-reference
>a
>> > numbered sub-pattern, so perhaps "\\" is now getting handled as the
>"null
>> > subpattern"? I don't see that mentioned in the regex help page, but
>it is a
>> > big "page". I also didn't see "\\" referenced in the TRE
>documentation, but
>> > then again I don't think that "\\" in console or source() input is
>a double
>> > backslash. The TRE document says that "A \ cannot be the last
>character of
>> > an ERE." I cannot tell whether that rule gets applied to the
>'replacement'.
>> >
>> >
>> >>
>> >>
>> >> I have observed similar behaviour for fixed=TRUE and perl=TRUE. I
>use R
>> >> 2.14.1 64-bit on Windows 7.
>> >
>> >
>> >
>> > --
>> > David Winsemius, MD
>> > West Hartford, CT
>> >
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> 538280 at gmail.com
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list