[R] Problem reading non-ISO data via read.csv2

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Oct 24 17:23:38 CEST 2006


On Tue, 24 Oct 2006, Gregor Gorjanc wrote:

> Hello!
>
> I have a CSV file and when I try to import it into R with read.csv2 I
> get the following error:
>
> R> read.csv2(file="pasme2.csv", na.strings="")
> Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings
> = character(0)) :
> 	invalid multibyte string
>
> There are some "strange" characters in it, but I never experienced such
> behaviour. Actually, this file was produced with R (2.4.0) on Windows!

Well, then it cannot be a UTF-8 file (there are no UTF-8 locales on 
Windows, not any means to write a UTF-8 file here), and you have told R to 
read it in your UTF-8 locale.

You need to specify the correct encoding: see ?file. But then you would 
have to do that in any application with such a text file, as an 
application could at best guess the encoding.

> Utility file under Linux says:
>
> $ file pasme2.csv
> pasme.csv: Non-ISO extended-ASCII text
>
> I am attaching few lines of a file for example. And mandatory info:

No attached file appeared.

> R version 2.4.0 (2006-10-03)
> i486-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list