[R] How to read HUGE data sets?

Emmanuel Charpentier charpent at bacbuc.dyndns.org
Thu Feb 28 15:47:31 CET 2008


Jorge Iván Vélez a écrit :
> Dear R-list,
> 
> Does somebody know how can I read a HUGE data set using R? It is a hapmap
> data set (txt format) which is around 4GB. After read it, I need to delete
> some specific rows and columns. I'm running R 2.6.2 patched over XP SP2
> using a 2.4 GHz Core 2-Duo processor and 4GB RAM. Any suggestion would be
> appreciated.

Hmmm... Unless you're running a 64-bits version of XP, you might be SOL
(nonwhistanding the astounding feats of the R Core Team, which managed
to be able to use about 3,5 GB of memory under 32-bits Windows) : your
*raw* data will eat more than the available memory. You might be lucky
if some of them can be abstracted (e. g. long character chains that can
be reduced to vectors), or get unlucky (large R storage overhead of
nonreducible data).

You might consider changing machines : get a 64-bit machine with gobs of
memory and cross your fingers. Note that, since R pointers are 64-bits
wide instead of 32-bits, data storage needs will inflate...

Depending of the real meaning of your data and the processing they need,
you might also consider storing your raw data in a SQL DBMS, reduce them
in SQL and read in R only the relevant part(s). There also  are some
contributed packages that might help in special situations : biglm, birch.

HTH,

					Emmanuel Charpentier



More information about the R-help mailing list