[R] reading in data with variable length
andy_liaw at merck.com
Tue Dec 6 16:16:03 CET 2005
Use file() connection in conjunction with readLines() and strsplit() should
do it. I would try to count the number of lines in the file first, and
create a list with that many components, then fill it in. I believe the
"array of cells" in Matlab is sort of equivalent to a list in R, but that's
beyond my knowledge of Matlab...
From: John McHenry
> I have very large csv files (up to 1GB each of ASCII text).
> I'd like to be able to read them directly in to R. The
> problem I am having is with the variable length of the data
> in each record.
> Here's a (simplified) example:
> $ cat foo.csv
> Name,Start Month,Data
> The records consist of rows with some set comma-separated
> fields (e.g. the "Name" & "Start Month" fields in the above)
> and then the data follow as a variable-length list of
> comma-separated values until a new line is encountered.
> Now I can use e.g.
> ta<-read.csv(fileName, header=F, skip=1, sep=",", dec=".", fill=T)
> which does the job nicely:
> V1 V2 V3 V4 V5 V6 V7 V8 V9
> V10 V11 V12 V13 V14 V15 V16 V17
> 1 Foo 10 -0.5615 2.3065 0.1589 -0.3649 1.5955 NA NA
> NA NA NA NA NA NA NA NA
> 2 Bar 21 0.0880 0.5733 0.0081 2.0253 -0.7602 0.7765 0.281
> 1.8546 0.2696 0.3316 0.1565 -0.4847 -0.1325 0.0454 -1.2114
> but the problem is with files on the order of 1GB this
> either crunches for ever or runs out of memory trying ...
> plus having all those NAs isn't too pretty to look at.
> (I have a MATLAB version that can read this stuff into an
> array of cells in about 3 minutes).
> I really want a fast way to read the data part into a list;
> that way I can access data in the array of lists containing
> the records by doing something ta[[i]]$data.
> [[alternative HTML version deleted]]
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide!
More information about the R-help