[Rd] Error: invalid multibyte string

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Oct 26 18:43:45 CEST 2006


Thomas Lumley <tlumley at u.washington.edu> writes:

> On Thu, 26 Oct 2006, Henrik Bengtsson wrote:
> 
> > I'm observing the following on different platforms:
> >
> >> parse(text='"\\x7F"')
> > expression("\177")
> >> parse(text='"\\x80"')
> > Error: invalid multibyte string
> 
> Yes. It's an invalid multibyte string.  In UTF-8 a single byte is a valid 
> character string only if it is below x80, so x7F is fine but x80 is not. 
> In fact x80 is not the leading byte of any valid UTF-8 character.
> 
> You have to work out what the Unicode code point is for whatever character 
> you were expecting to be x80 and convert that to UTF-8.
> 
> I'm surprised that one of your UTF-8 machines worked -- I don't think it 
> should.

Interestingly, we can parse, but not print or deparse:

> x<-parse(text='"\\x80"')
> x
Error: invalid multibyte string
> z <- deparse(x)
Error in deparse(x) : invalid multibyte string
> cat(x[[1]])
�>

(the last line has a funny little cedilla-like symbol in pos 1)

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-devel mailing list