[R] How to deal with more than 6GB dataset using R?

Jens Oehlschlägel jens.oehlschlaegel at truecluster.com
Wed Jul 28 10:51:42 CEST 2010


Matthew,

You might want to look at function read.table.ffdf in the ff package, which can read large csv files in chunks and store the result  in a binary format on disk that can be quickly accessed from R. ff allows you to access complete columns (returned as a vector or array) or subsets of the data identified by row-positions (and column selection, returned as a data.frame). As Jim pointed out: all depends on what you are going with the data. If you want to access subsets not by row-position but rather by search conditions, you are better-off with an indexed database. 

Please let me know if you write a fast read.fwf.ffdf - we would be happy to include it into the ff package.


Jens



More information about the R-help mailing list