[R] reading heterogeneous CSV

Gabor Grothendieck ggrothendieck at gmail.com
Wed Aug 12 04:53:59 CEST 2009


This will read it in all in and then you can decide
what you want to do with it:

Lines <- "DISKREAD,metadata about disks
MEM,metadata about memory
ZZZZ,observation-identifier,time,date
DISKREAD,observation-identifier,data about disks
MEM,observation-identifier,data about memory"

DF <- read.table(textConnection(Lines), sep = ",", fill = TRUE)


On Tue, Aug 11, 2009 at 2:55 PM, Allen S. Rout<asr at ufl.edu> wrote:
>
>
> Greetings, all.
>
> I've got a datafile I've been working with that has an ideosyncratic,
> heterogeneous format.  It's grossly like:
>
>
> [...]
> DISKREAD,metadata about disks
> MEM,metadata about memory
>
> ZZZZ,observation-identifier,time,date
> DISKREAD,observation-identifier,data about disks
> MEM,observation-identifier,data about memory
>
> [ and repeat for each observation ]
>
> What I've done in the past was take the monolithic file, and
> preprocess it into files, one per observation type.  The observation
> types are structurally self-similar, so once I have them split up,
> normal read.csv methods work just fine.  Then I read the ZZZZ file to
> get timestamps, and whichever observation files I care about on this
> run.
>
>
> But ideally, I'd like to do this entire operation with R features, and
> without multiple passes through the file.
>
> The line lengths vary wildly, so a read.table doesn't help.
>
>
> I was visualizing the following:
>
> + create a FIFO for each desired observation class, including the ZZZZ metadata
> + In one pass through the source file, populate the FIFOs with their data
> + read.csv the output sides of the FIFOs.
>
>
> But I have problems right out of the gate: when I set a data.frame
> element to the output of fifo(), what actually gets inserted seems to
> be an integer; I am guessing it's being turned into a factor.
>
>
> example:
> ----
> desired_slices=c("ZZZZ","DISKWRITE")
> temps = data.frame(slice=desired_slices,row.names=1,handle=I(""))
>
> temps["ZZZZ",] = fifo("./ZZZZ",open="w+")
> showConnections()
>  ( you can see that the connection is open)
> temps
>  ( you can see that the contents of the data.frame cell is the filehandle number)
> -----
>
> Am I just barking up the wrong tree?
>
>
>
> - Allen S. Rout
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list