[Rd] R 2.7.0, match() and strings containing \0 - bug?

Seth Falcon seth at userprimary.net
Mon Apr 28 16:01:53 CEST 2008


Hi Jon,

* On 2008-04-28 at 11:00 +0100 Jon Clayden wrote:
> A piece of my code that uses readBin() to read a certain file type is
> behaving strangely with R 2.7.0. This seems to be because of a failure
> to match() strings after using rawToChar() when the original was
> terminated with a "\0" character. Direct equality testing with ==
> still works as expected. I can reproduce this as follows:
> 
> > x <- "foo"
> > y <- c(charToRaw("foo"),as.raw(0))
> > z <- rawToChar(y)
> > z==x
> [1] TRUE
> > z=="foo"
> [1] TRUE
> > z %in% c("foo","bar")
> [1] FALSE
> > z %in% c("foo","bar","foo\0")
> [1] FALSE
> 
> But without the nul character it works fine:
> 
> > zz <- rawToChar(charToRaw("foo"))
> > zz %in% c("foo","bar")
> [1] TRUE
> 
> I don't see anything about this in the latest NEWS, but is this
> expected behaviour? Or is it, as I suspect, a bug? This seems to be
> new to R 2.7.0, as I said.

The short answer is that your example works in R-2.6 and in the
current R-devel.  Whether the behavior in R-2.7 is a bug is perhaps in
the eye of the beholder.

Historically, R's internal string representation allowed for embedded
nul characters.  This was particularly useful before the raw vector
type, RAWSXP, was introduced.  Since the vast majority of
R's internal string processing functions use standard C semantics
and truncated at first nul there has always been some room for
"interesting" behavior.  The change in R-2.7 was an attempt to start
resolving these inconsistencies.  Since then the core team has agreed
to remove the partial support for embedded nul in character strings --
raw can be used when this is desired, and having nul terminated
strings will make the code more consistent and easier to maintain
going forward.

Best Wishes,

+ seth

-- 
Seth Falcon | http://userprimary.net/user/



More information about the R-devel mailing list