[R] Data bug that read.csv doesn't like

Randall Johnson [Contr] rjohnson at ncifcrf.gov
Thu Dec 20 20:24:01 CET 2007


Hello,
I have a bug in my data that read.csv doesn't like, but _only_ when  
specifying "na.strings = 'missing'". If I delete the offending Chinese  
characters the problem goes away as well. I'm satisfied that the  
problems with this data file are fixed, but is there anything I can to  
do avoid this in the future (other than avoiding Chinese characters).  
Any ideas as to what is going on here? I've attached the piece of the  
data file I used for the example below.

Best,
Randy

 > read.csv('../data/tmp.csv')
   Smoking_status Age_start_smoking         Pack_day
1              0
2              0
3        missing           missing          missing
4              1                18 \xc9\xd9\xc1\xbf
5              1                20                1
 > read.csv('../data/tmp.csv', na.strings = 'missing')
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec,  
na.strings = character(0)) :
   invalid multibyte string
 > sessionInfo()
R version 2.6.1 (2007-11-26)
i386-apple-darwin9.1.0

locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
 >

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Randall C Johnson
Bioinformatics Analyst
SAIC-Frederick, Inc (Contractor)
Laboratory of Genomic Diversity
NCI-Frederick, P.O. Box B
Bldg 560, Rm 11-85
Frederick, MD 21702
Phone: (301) 846-1304
Fax: (301) 846-1686
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



More information about the R-help mailing list