[Rd] locales and readLines

Martin Morgan mtmorgan at fhcrc.org
Fri Aug 31 18:30:43 CEST 2007


R-developers,

I'm looking for some 'best practices', or perhaps an upstream solution
(I have a deja vu about this, so sorry if it's already been asked).
Problems occur when a file is encoded as latin1, but the user has a
UTF-8 locale (or I guess more generally when the input locale does not
match R's).  Here are two examples from the Bioconductor help list:

https://stat.ethz.ch/pipermail/bioconductor/2007-August/018947.html

(the relevant command is library(GEOquery); gse <- getGEO('GSE94'))

https://stat.ethz.ch/pipermail/bioconductor/2007-July/018204.html

I think solutions are:

* Specify the encoding in readLines.

* Convert the input using iconv.

* Tell the user to set their locale to match the input file (!)

Unfortunately, these (1 & 2, anyway) place extra burden on the package
author, to become educated about locales, the encoding conventions of
the files they read, and to know how R deals with encodings.

Are there other / better solutions? Any chance for some (additional)
'smarts' when reading files?

Martin
-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the R-devel mailing list