[Rd] String encoding problem

peter dalgaard pdalgd at gmail.com
Thu Jul 7 18:51:13 CEST 2016


> On 07 Jul 2016, at 18:15 , Hadley Wickham <h.wickham at gmail.com> wrote:
> 
> Right - I'm aware of that.  But to me, it doesn't seem correct to
> print a string that is not a valid R string. Why is an unknown
> encoding printed like UTF-8?
> 

It isn't -- no UTF-8 would have the \xbf. I may be flogging a dead horse, but it seems to me that there are three alternatives:

- refuse the input (x <- "\xc9\x82\xbf" gives "sorry, not a UTF-8 string" or so)
- refuse to print it (print(x) gives "cannot print non-UTF-8 string")
- what happens now

and a fourth one might be to actually allow mixing of \u0007 and \x07 and \007, but I suspect that there are demons down the line which is why it is not happening now. (Does it ring a bell with anyone?)

-pd


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list