[R] Processing large datasets

Mike Marchywka marchywka at hotmail.com
Wed May 25 17:00:57 CEST 2011




----------------------------------------
> Date: Wed, 25 May 2011 10:18:48 -0400
> From: roman at bestroman.com
> To: mailinglist.honeypot at gmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] Processing large datasets
>
> > Hi,
> > If your datasets are *really* huge, check out some packages listed
> > under the "Large memory and out-of-memory data" section of the
> > "HighPerformanceComputing" task view at CRAN:
>
> > http://cran.r-project.org/web/views/HighPerformanceComputing.html

Does this have any specific limitations ? It sounds offhand like it
does paging and all the needed buffering for arbitrary size
data. Does it work with everything? I seem to recall bigmemory came up
before in this context and there was some problem.

Thanks.



>
> > Also, if you find yourself needing to do lots of
> > "grouping/summarizing" type of calculations over large data
> > frame-like objects, you might want to check out the data.table package:
>
> > http://cran.r-project.org/web/packages/data.table/index.html
>
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> > | Memorial Sloan-Kettering Cancer Center
> > | Weill Medical College of Cornell University
> > Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> I don't think data.table is fundamentally different from data.frame type, but thanks for the suggestion.
>
> http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
> "Just like data.frames, data.tables must fit inside RAM"
>
> The ff package by Adler, listed in "Large memory and out-of-memory data" is probably most interesting.
>
> --Roman Naumenko
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list