[Rd] Embedded nuls in strings

Steven McKinney smckinney at bccrc.ca
Tue Aug 7 23:27:19 CEST 2007


I get similar results on an Apple Mac G5
running OS X, though nchar() works.

>   raw0 <- as.raw(c(65:68, 0 , 70))
>   string0 <- rawToChar(raw0)
> raw0
[1] 41 42 43 44 00 46
> string0
[1] "ABCD\0F"

> nchar(string0)
[1] 6

> grep("F", string0)
integer(0)
>   strsplit(string0, split=NULL, fixed=TRUE)[[1]]
[1] "A" "B" "C" "D"
>   tolower(string0)
[1] "abcd"
>   chartr("F", "x", string0)
[1] "ABCD"
>   substr(string0, 6, 6)
[1] ""
> 
> sessionInfo()
R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "splines"   "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
> 



Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-----Original Message-----
From: r-devel-bounces at r-project.org on behalf of Herve Pages
Sent: Tue 8/7/2007 2:06 PM
To: r-devel at r-project.org
Subject: [Rd] Embedded nuls in strings
 
Hi,

?rawToChar
     'rawToChar' converts raw bytes either to a single character string
     or a character vector of single bytes.  (Note that a single
     character string could contain embedded nuls.)

Allowing embedded nuls in a string might be an interesting experiment but it
seems to cause some troubles to most of the string manipulation functions.

A string with an embedded 0:

  raw0 <- as.raw(c(65:68, 0 , 70))
  string0 <- rawToChar(raw0)

> string0
[1] "ABCD\0F"

nchar() should return 6:
> nchar(string0)
[1] 4

In addition this embedded nul seems to break almost all string manipulation/searching
functions:
  grep("F", string0)
  strsplit(string0, split=NULL, fixed=TRUE)[[1]]
  tolower(string0)
  chartr("F", "x", string0)
  substr(string0, 6, 6)
  ...
  etc...

Not very surprisingly, they all seem to treat string0 as if it was "ABCD"!

Cheers,
H.

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list