[R] disabling NA token as na.string in read.table

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Thu Dec 19 23:17:03 CET 2002


Vadim Ogranovich <vograno at arbitrade.com> writes:

> Dear R-Users,
> 
> I have a csv file that has NA tokens and these tokens are perfectly good
> values that need not to be converted to NA by read.table(). I tried to
> prevent the conversion by specifying the na.strings arg., but this seems to
> only add to the list of NA strings, not substitute.
> 
> > system("cat foo")
> system("cat foo")
> 1 foo
> 2 NA
> > read.table("foo", na.strings="foo")
> read.table("foo", na.strings="foo")
>   V1 V2
> 1  1 NA
> 2  2 NA
> 
> 
> This is R1.6.0 on Linux.
> 
> What did I do wrong?

Hmm, this looks like a bit of a bug. read.table() ends up calling
type.convert() with its default "NA" na.string. Now, if "NA" was in
the na.string for read.table(), scan() would already have turned it
into <NA> at that point, so I suspect you might have preferred
na.strings=character(0), but that has the side effect of turning the
real NA into a factor level:

> x <- c(NA,"NA","foo")
> type.convert(x)
[1] <NA> <NA> foo
Levels: foo
> type.convert(x,na.strings=character(0))
[1] <NA> NA   foo
Levels: NA foo NA
> dput(type.convert(x,na.strings=character(0)))
structure(c(3, 1, 2), .Label = c("NA", "foo", NA), class = "factor")

I.e. it looks like the internals of type.convert needs some fixing up.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list