[R] Reading in a table with ISO-latin1 encoding in MacOS-X (Intel)

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 8 16:17:23 CEST 2006


You are using this as intended, although your email message came in latin9 
not latin1, which does not affect your examples.  Have you actually 
checked (e.g. via a hex dump) that the file is in latin1?

I assume that if you converted the file to UTF-8 you then used

read.table(R_data/hs+sfnet.T.060505.tbl4", header=TRUE)

If so, you need to investigate the locale in use, as which letters are 
valid depends on the locale: on Linux UTF-8 locales all letters in all 
languages are valid in R names, but that is not necessarily the MacOS 
interpretation.  (Invalid characters in names will be converted to ., and 
if the locale is wrong so may be the interpretation of bytes as 
characters.)

You might find more informed help on the r-sig-mac list.


On Thu, 8 Jun 2006, Antti Arppe wrote:

> Dear colleages in R,
>
> I have earlier been working with R in Linux, where reading in a table 
> containing Scandinavian letters ("ä", "ö", and "å") in the header as part of 
> variable names has not caused any problem whatsoever.
>
> However, when trying to do the same in R running on new MacOS-X (with an 
> Intel processor) with the same original text table does not seem to work 
> whichever way I try. Following the recommendations on the R site and using 
> the 'file' function to set the encoding breaks down at the first encounter 
> with a Scandinavian character:
>
> THINK <- read.table(file("R_data/hs+sfnet.T.060505.tbl4", 
> encoding="latin1"),header=TRUE)
> Warning messages:
> 1: invalid input found on input connection 'R_data/hs+sfnet.T.060505.tbl4'
> 2: incomplete final line found by readTableHeader on 
> 'R_data/hs+sfnet.T.060505.tbl4'
>
> A sample exemplifying such characters as variable labels is below (for which 
> the behavior of R in Mac is the same as for the larger file referred to 
> above):.
>
>   ajatella miettiä pohtia
> 1     FALSE   FALSE   TRUE
> 2     FALSE   FALSE  FALSE
> 3     FALSE    TRUE  FALSE
> 4     FALSE    TRUE  FALSE
> 5      TRUE   FALSE  FALSE
> 6      TRUE   FALSE  FALSE
> 7     FALSE   FALSE  FALSE
> 8     FALSE    TRUE  FALSE
> 9     FALSE    TRUE  FALSE
> 10    FALSE   FALSE  FALSE
>
> Converting the the file from ISO-latin-1 to UTF8 (with Mac's TextEdit 
> application)allows the file to be read in in its entirety, but still the 
> Scandinavian character in the heading is coerced to a period '.', or two, in 
> fact (i.e. 'miettiä' -> 'miett..')
>
> Have I possibly misunderstood how the 'file' function should be used in 
> conjunction with 'read.table', or might the problem with latin1-to-utf 
> conversion be somewhere else?
>
> Appreciating any help on this matter,
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list