[R] How to read HUGE data sets?

Roland Rau roland.rproject at gmail.com
Thu Feb 28 22:47:55 CET 2008


Hi,

Jorge Iván Vélez wrote:
> Dear R-list,
> 
> Does somebody know how can I read a HUGE data set using R? It is a hapmap
> data set (txt format) which is around 4GB. After read it, I need to delete
> some specific rows and columns. I'm running R 2.6.2 patched over XP SP2

in such a case, I would recommend not to use R in the beginning. Try to 
use awk[1] to cut out the correct rows and columns. If the resulting 
data are still very large, I would suggest to read it into a Database 
System. My experience is limited in that respect: I only used SQLite. 
But in conjunction with the RSQLite package, I was managed all my "big 
data problems".

Check http://www.ibm.com/developerworks/library/l-awk1.html to get you 
smoothly started with awk.

I hope this helps,
Roland

[1] I think the gawk implementation offers most options (e.g. for 
timing) but I recently used mawk on Windows XP and it was way faster (or 
was it nawk?). If you don't have experience in some language such as 
perl, I'd say it is much easier to learn awk than perl.



More information about the R-help mailing list