[Rd] Error: invalid multibyte string

Thomas Lumley tlumley at u.washington.edu
Thu Oct 26 17:32:37 CEST 2006


On Thu, 26 Oct 2006, Henrik Bengtsson wrote:

> I'm observing the following on different platforms:
>
>> parse(text='"\\x7F"')
> expression("\177")
>> parse(text='"\\x80"')
> Error: invalid multibyte string

Yes. It's an invalid multibyte string.  In UTF-8 a single byte is a valid 
character string only if it is below x80, so x7F is fine but x80 is not. 
In fact x80 is not the leading byte of any valid UTF-8 character.

You have to work out what the Unicode code point is for whatever character 
you were expecting to be x80 and convert that to UTF-8.

I'm surprised that one of your UTF-8 machines worked -- I don't think it 
should.

 	-thomas

> ...
>> parse(text='"\\xFF"')
> Error: invalid multibyte string
>
> However,
>
> cat("\x7F\n\x80\n...\xFF\n")
>
> works.  Using R --vanilla.

> SYSTEMS GIVING THE ERROR:
>> sessionInfo()
> R version 2.4.0 (2006-10-03)
> x86_64-unknown-linux-gnu
> locale:
> LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8;LC_MONETARY=en_AU.UTF-8;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C
>
> R version 2.4.0 Patched (2006-10-03 r39576)
> i686-pc-linux-gnu
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
>
> SYSTEMS OK:
> R version 2.4.0 Under development (unstable) (2006-07-23 r38687)
> x86_64-unknown-linux-gnu
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> R version 2.4.0 (2006-10-03)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> R version 2.4.0 Patched (2006-10-10 r39600)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MONETARY=En
> glish_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>
> Version 2.3.0 (2006-04-24)
> x86_64-unknown-linux-gnu
> locale: <not reported>
>
>
> All of the above have the following packages attached:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "base"
>
> We identified this problem because R CMD check complained:
>
>> * checking package dependencies ... WARNING
>> Error in deparse(e[[2]]) : invalid multibyte string
>> Execution halted
>
> because we use "\xFF" (or "\377") in the source code to be used as a
> terminator in a vector buffer; "\0" can't be used for other reasons.
>
> Is the above a bug in R or one in my head?
>
> /H
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-devel mailing list