[R] questions on csv reading

Gabor Grothendieck ggrothendieck at gmail.com
Sat Sep 26 21:19:19 CEST 2009


2009/9/26 "Jens Oehlschlägel" <oehl_list at gmx.de>:
> Hi,
>
> Is there any official way to determine the colClasses of a data.frame?
> Why has POSIXct such a strange class structure?
> Why is colClasses "ordered" not allowed (and doesn't work)?
>
> Background
> ==========
> I am writing a chunked csv reader that provides the functionality of read.table for large files (in the next version of package ff). In chunked reading, one wants to learn the colClasses from the data.frame returned for the first chunk and submit this as argument colClasses= to the following chunks (following calls to read.table).
>
> for most column types
> colClasses <- sapply(data.frame, class)
> works fine. However, two column types have more than one class:
>
> "ordered" has c("ordered", "factor") - currently we can't tell read.table that a column is an ordered factor

Possibly more complex than one would wish but it is possible to do this:

Lines <- "A
B
D
C"

setOldClass("ordered")
setAs("character", "ordered", function(from) ordered(from))

DF <- read.table(textConnection(Lines), colClasses = "ordered")
str(DF)

> "POSIXct" has c("POSIXt","POSIXct") - here the LESS specific class "POSIXt" is in the first position and would win in class-dispatch over the MORE specific class "POSIXct". Why?
>

Its a historical error that is too late to correct now.  See
discussion in Chambers' recent book.




More information about the R-help mailing list