[Rd] Embedded nuls in strings

Herve Pages hpages at fhcrc.org
Tue Aug 7 23:06:56 CEST 2007


Hi,

?rawToChar
     'rawToChar' converts raw bytes either to a single character string
     or a character vector of single bytes.  (Note that a single
     character string could contain embedded nuls.)

Allowing embedded nuls in a string might be an interesting experiment but it
seems to cause some troubles to most of the string manipulation functions.

A string with an embedded 0:

  raw0 <- as.raw(c(65:68, 0 , 70))
  string0 <- rawToChar(raw0)

> string0
[1] "ABCD\0F"

nchar() should return 6:
> nchar(string0)
[1] 4

In addition this embedded nul seems to break almost all string manipulation/searching
functions:
  grep("F", string0)
  strsplit(string0, split=NULL, fixed=TRUE)[[1]]
  tolower(string0)
  chartr("F", "x", string0)
  substr(string0, 6, 6)
  ...
  etc...

Not very surprisingly, they all seem to treat string0 as if it was "ABCD"!

Cheers,
H.



More information about the R-devel mailing list