[Rd] sub and gsub treat \\ incorrectly (PR#13454)

William Dunlap wdunlap at tibco.com
Tue Jan 20 00:24:56 CET 2009


> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of amiransk at uwo.ca
> Sent: Monday, January 19, 2009 10:25 AM
> To: r-devel at stat.math.ethz.ch
> Cc: R-bugs at r-project.org
> Subject: [Rd] sub and gsub treat \\ incorrectly (PR#13454)
> 
> Sub and gsub treat \\ replacement pattern incorrectly
> 
> I expect
>   sub("a","\\", "a", perl=T)
> to produce
>   [1] "\"
> instead it generates
>   [1] ""
> 
> On the other hand, if I run
>   sub("a","\\\\", "a", perl=T)
> it correctly outputs
>   [1] "\\"

The replacement pattern may include \\digit, which means
to put the digit'th parenthesized subexpression into the
replacement.  E.g.
   > sub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five")
   [1] "two One three four five"
   > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\2 \\1", "One two three
four five")
   [1] "two One four three five"
To support this without ambiguity or surprises, \\ is expected
to be followed by a digit (or L or U when perl=TRUE).

When fixed=TRUE then there is no possibility of a parenthesized
subexpression so \\2 is taken literally.

help(gsub) is not explicit about this behavior.

Because I initially made the same mistake, when I wrote the S+
versions of gsub and sub I included a warning when the replacement
included a \\ not followed by a digit:

  > gsub("([[:alpha:]]+) +([[:alpha:]]+)", "\\ \\", "One two three four
five")
  [1] "    five"
  Warning messages:
    backslash in replacement argument of substituteString(fixed=F) is
not
          followed by backslash or digit, hence backslash is omitted in:
substit\
          uteString(pattern = pattern, replacement = replacement, x = x,
extended ....

> The same issue applies to gsub.
> 
> --please do not edit the information below--
> 
> Version:
>  platform = i386-pc-mingw32
>  arch = i386
>  os = mingw32
>  system = i386, mingw32
>  status = 
>  major = 2
>  minor = 8.1
>  year = 2008
>  month = 12
>  day = 22
>  svn rev = 47281
>  language = R
>  version.string = R version 2.8.1 (2008-12-22)
> 
> Windows XP (build 2600) Service Pack 2
> 
> Locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> Search Path:
>  .GlobalEnv, package:stats, package:graphics, 
> package:grDevices, package:utils, package:datasets, 
> package:methods, Autoloads, package:base
> 
> -- 
> Sincerely,
>  Andriy
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list