[Rd] Error: invalid multibyte string

Henrik Bengtsson hb at stat.berkeley.edu
Sat Oct 28 00:40:55 CEST 2006


On 10/28/06, Thomas Lumley <tlumley at u.washington.edu> wrote:
> On Fri, 27 Oct 2006, Henrik Bengtsson wrote:
>
> > In Section "Package subdirectories" in "Writing R Extensions" [2.4.0
> > (2006-10-10)] it says:
> >
> > "Only ASCII characters (and the control characters tab, formfeed, LF
> > and CR) should be used in code files. Other characters are accepted in
> > comments, but then the comments may not be readable in e.g. a UTF-8
> > locale. Non-ASCII characters in object names will normally [1] fail
> > when the package is installed. Any byte will be allowed [2] in a
> > quoted character string (but \uxxxx escapes should not be used), but
> > non-ASCII character strings may not be usable in some locales and may
> > display incorrectly in others.", where the footnote [2] reads "It is
> > good practice to encode them as octal or hex escape sequences".
> >
> > (Note: ASCII refers (correctly) to the 7-bit ASCII [0-127] and none of
> > the 8-bit ASCII extensions [128-255].)
> >
> > According to sentense about quoted strings, the following R/*.R code
> > should still be valid:
> >
> >    pads <- sapply(0:64, FUN=function(x) paste(rep("\xFF", x), collapse=""));
>
> That looks like it should be valid (at least according to the
> documentation), even though it won't run usefully on UTF-F locales.  What
> you wrote before was:
>
> >> > On Thu, 26 Oct 2006, Henrik Bengtsson wrote:
> >> >
> >> > > I'm observing the following on different platforms:
> >> > >
> >> > >> parse(text='"\\x7F"')
> >> > > expression("\177")
> >> > >> parse(text='"\\x80"')
> >> > > Error: invalid multibyte string
>
> and that error *is* correct behaviour -- you can't parse() something that
> isn't a valid character string.

Hmm... are you really sure?  That should be a (double) quoted \x80
(four characters + quotes), which has been put in a (single) quoted
string where backslash is escaped?

Maybe it is more clear to write:

> expr <- parse(text='x <- "\\x41"')
> eval(expr)
> print(x)
[1] "A"

and same for

> expr <- parse(text='x <- "\\x7F"')
> eval(expr)
> print(x)
> expr <- parse(text='x <- "\\x80"')
> eval(expr)
> print(x)

(Unfortunately I can't access the machines that gives me the errors
right now, but I assume the error occurs when eval() is called.)

/H

>
>         -thomas
>




More information about the R-devel mailing list