[Rd] A couple of issues with colClasses/setAs

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Sep 8 00:34:23 CEST 2004


Consider this:

$ cat test.dat
1 a
2 b

Now, we want to read the 2nd column as a factor and ignore the first
(since it's just a sequential ID). We can't just put "factor" among
the colClasses (would have been nice), so let's try this instead

> setAs("character","factor",as.factor)
Arguments in definition changed from (x) to (from)
> read.table("test.dat",colClasses=c("numeric","factor"))
Error in inherits(x, "factor") : Object "x" not found

which is a bit peculiar: Why does it change the argument when that's
going to create a function that doesn't work?? You do need to spell it
out:

> setAs("character","factor",function(from)as.factor(from))

And now we get somewhere

> read.table("test.dat",colClasses=c("numeric","factor"))
  V1 V2
1  1  a
2  2  b

but suppose we want to get rid of col.1:

> read.table("test.dat",colClasses=c("NULL","factor"))
Error in data[[i]] : subscript out of bounds

which looks like a pretty clear bug. In contrast, this works fine

> read.table("test.dat",colClasses=c("NULL","character"))
  V2
1  a
2  b

so the issue only arises when you have nontrivial coercions.

Presumably, the issue is that the colClasses in those cases
miscalculate indices by forgetting the columns that were skipped.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907



More information about the R-devel mailing list