[R] invalid multibyte string at '<b0>C'

Milan Bouchet-Valat nalimilan at club.fr
Sat Apr 12 11:59:53 CEST 2014


Le vendredi 11 avril 2014 à 16:02 -0700, Fisher Dennis a écrit :
> R 3.0.2
> OS X Mavericks
> 
> Colleagues
> 
> I have a file that I converted from SAS (sas7bdat) to CSV (filename:
> ORIGINAL.csv).  I try to read it with read.csv and I receive the error
> message:
> 	Error in type.convert(data[[i]], as.is = as.is[i], dec = dec,
> na.strings = character(0L)) : 
> 	  invalid multibyte string at '<b0>C’
> The problem resolves if I delete a single character from each of lines
> 2 and 4 of the file (filename: FIXED.csv)
> 
> readLines can read both files without problem and displays the
> offending character as:
> 	\xb0
> which appears to be a degree sign.
> 
> I also tried:
> 	read.csv(textConnection(readLines(“ORIGINAL.csv”)))
> and encountered the same error message.
> 
> In the past, I have encountered the same problem with Greek symbols
> (e.g., mu) and other special characters.
> 
> Short of editing the input file, is there a simple solution within R
> so that I can read the input data into a dataframe?
> One possible (but ugly) solution would be:
> 	TEMP	<- readLines(FILENAME)
> 	TEMP	<- gsub(offendingcharacter, replacementcharacter, TEMP)
> However, this would require that I find all possible offending
> characters and the corresponding replacements.
Well, if the conversion too did its job correctly, you should be able to
find out what's the encoding used by the file and import these
characters correctly instead of removing them. <b0> would be the correct
degree character in ISO-8859-1. So try
read.csv(file, fileEncoding="ISO-8859-1")

> The files are available for inspection at:
> 	http://www.plessthan.com/FILES/ARCHIVE.zip
The link does not appear to work here.


Regards




More information about the R-help mailing list