[Rd] latin1,utf-8...encoding and data

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 18 17:53:19 CEST 2006


Only ASCII letters are portable: those accented characters do not even 
exist in many of the encodings used for R, e.g. Russian and Japanese on 
Windows machines.

There is no way to associate an encoding with a character string in R.  We 
considered it, but it would have had severe back-compatibility problems 
and little advantage (you cannot display non-ASCII character strings 
portably: even if you have a Unicode encoding you still need to select a 
suitable font).

'B. Ripley' (sic)


On Wed, 18 Oct 2006, Stéphane Dray wrote:

> Hello,
> I have some questions concerning encoding and package distribution. We
> develop the ade4 package. For some data sets included in the package,
> there are accentued character (e.g. é,è...). The data sets have been
> saved using latin1 encoding, but some of us use utf-8 and can not see
> some data sets which contains accented chracters.
> e.g:
>
> librarry(ade4)
> data(rankrock)
> rankrock
>
> in this case, characters are in rownames. Other data sets have such
> characters in data (e.g. levels of factors..). A solution is to use
> iconv... this is quite easy for us but perhaps more difficult for a user
> which can have no idea of the problem. This problem is quite marginal
> for the moment but some linux distribution are utf-8 by default (e.g.
> ubuntu) and I suppose that the problem will be more and more present in
> the future.
>
> So we wonder if there is a proper way to code and save these data sets.
> I have found some documents of B. Ripley and this note :
>
> http://developer.r-project.org/210update.txt
>
>  -  Names in data objects (e.g. in .rda files) are problematic.  It
>     is likely that by release time these will be treated as in
>     Latin-1.
>
> If I am correct, I did not find an answer to this problem.
>
> What are the plans of R gurus on this question ?
> Thanks a lot.
> Sincerely.
>
> Please add my adress in answers as I am not subsciber of this list.
>
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list