[Rd] Error: invalid multibyte string

Henrik Bengtsson hb at stat.berkeley.edu
Mon Oct 30 02:37:01 CET 2006


On 10/28/06, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> On 10/28/06, Thomas Lumley <tlumley at u.washington.edu> wrote:
> > On Fri, 27 Oct 2006, Henrik Bengtsson wrote:
> >
> > > In Section "Package subdirectories" in "Writing R Extensions" [2.4.0
> > > (2006-10-10)] it says:
> > >
> > > "Only ASCII characters (and the control characters tab, formfeed, LF
> > > and CR) should be used in code files. Other characters are accepted in
> > > comments, but then the comments may not be readable in e.g. a UTF-8
> > > locale. Non-ASCII characters in object names will normally [1] fail
> > > when the package is installed. Any byte will be allowed [2] in a
> > > quoted character string (but \uxxxx escapes should not be used), but
> > > non-ASCII character strings may not be usable in some locales and may
> > > display incorrectly in others.", where the footnote [2] reads "It is
> > > good practice to encode them as octal or hex escape sequences".
> > >
> > > (Note: ASCII refers (correctly) to the 7-bit ASCII [0-127] and none of
> > > the 8-bit ASCII extensions [128-255].)
> > >
> > > According to sentense about quoted strings, the following R/*.R code
> > > should still be valid:
> > >
> > >    pads <- sapply(0:64, FUN=function(x) paste(rep("\xFF", x), collapse=""));
> >
> > That looks like it should be valid (at least according to the
> > documentation), even though it won't run usefully on UTF-F locales.  What
> > you wrote before was:
> >
> > >> > On Thu, 26 Oct 2006, Henrik Bengtsson wrote:
> > >> >
> > >> > > I'm observing the following on different platforms:
> > >> > >
> > >> > >> parse(text='"\\x7F"')
> > >> > > expression("\177")
> > >> > >> parse(text='"\\x80"')
> > >> > > Error: invalid multibyte string
> >
> > and that error *is* correct behaviour -- you can't parse() something that
> > isn't a valid character string.
>
> Hmm... are you really sure?  That should be a (double) quoted \x80
> (four characters + quotes), which has been put in a (single) quoted
> string where backslash is escaped?
>
> Maybe it is more clear to write:
>
> > expr <- parse(text='x <- "\\x41"')
> > eval(expr)
> > print(x)
> [1] "A"
>
> and same for
>
> > expr <- parse(text='x <- "\\x7F"')
> > eval(expr)
> > print(x)
> > expr <- parse(text='x <- "\\x80"')
> > eval(expr)
> > print(x)
>
> (Unfortunately I can't access the machines that gives me the errors
> right now, but I assume the error occurs when eval() is called.)

The error occurs when print():ing, i.e.

> expr <- parse(text='x <- "\\x7F"')
> eval(expr)
> print(x)
[1] "\177"

> expr <- parse(text='x <- "\\x80"')
> eval(expr)
> print(x)
[1]Error: invalid multibyte string

/Henrik

>
> /H
>
> >
> >         -thomas
> >
>




More information about the R-devel mailing list