[Rd] latin1,utf-8...encoding and data

Stéphane Dray dray at biomserv.univ-lyon1.fr
Thu Oct 19 09:46:49 CEST 2006


Thanks a lot for this clear answer. So there is no way to preserve our 
french cultural exception (accented characters), if we want to be 
international... I have thought that the inclusion of  a parameter 
encoding in data function (e.g. data(mydata,encoding="latin1")) like in 
the function 'file' could be an way to solve the problem. Apparently, 
the problem is much more complicated...

Sincerely.


Prof Brian Ripley wrote:

> Only ASCII letters are portable: those accented characters do not even 
> exist in many of the encodings used for R, e.g. Russian and Japanese 
> on Windows machines.
>
> There is no way to associate an encoding with a character string in 
> R.  We considered it, but it would have had severe back-compatibility 
> problems and little advantage (you cannot display non-ASCII character 
> strings portably: even if you have a Unicode encoding you still need 
> to select a suitable font).
>
> 'B. Ripley' (sic)
>
>
> On Wed, 18 Oct 2006, Stéphane Dray wrote:
>
>> Hello,
>> I have some questions concerning encoding and package distribution. We
>> develop the ade4 package. For some data sets included in the package,
>> there are accentued character (e.g. é,è...). The data sets have been
>> saved using latin1 encoding, but some of us use utf-8 and can not see
>> some data sets which contains accented chracters.
>> e.g:
>>
>> librarry(ade4)
>> data(rankrock)
>> rankrock
>>
>> in this case, characters are in rownames. Other data sets have such
>> characters in data (e.g. levels of factors..). A solution is to use
>> iconv... this is quite easy for us but perhaps more difficult for a user
>> which can have no idea of the problem. This problem is quite marginal
>> for the moment but some linux distribution are utf-8 by default (e.g.
>> ubuntu) and I suppose that the problem will be more and more present in
>> the future.
>>
>> So we wonder if there is a proper way to code and save these data sets.
>> I have found some documents of B. Ripley and this note :
>>
>> http://developer.r-project.org/210update.txt
>>
>>  -  Names in data objects (e.g. in .rda files) are problematic.  It
>>     is likely that by release time these will be treated as in
>>     Latin-1.
>>
>> If I am correct, I did not find an answer to this problem.
>>
>> What are the plans of R gurus on this question ?
>> Thanks a lot.
>> Sincerely.
>>
>> Please add my adress in answers as I am not subsciber of this list.
>>
>>
>>
>


-- 
Stéphane DRAY (dray at biomserv.univ-lyon1.fr )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57       Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/




More information about the R-devel mailing list