[R] Processing large datasets

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed May 25 18:32:37 CEST 2011


Hi,

On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka <marchywka at hotmail.com> wrote:
[snip]
>> > If your datasets are *really* huge, check out some packages listed
>> > under the "Large memory and out-of-memory data" section of the
>> > "HighPerformanceComputing" task view at CRAN:
>>
>> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
>
> Does this have any specific limitations ? It sounds offhand like it
> does paging and all the needed buffering for arbitrary size
> data. Does it work with everything?

I'm not sure what limitations ... I know the bigmemory (and ff)
packages try hard to make using out-of-memory datasets as
"transparent" as possible.

That having been said, I guess you will have to port "more advanced"
methods to use such packages, hence the existence of the biglm,
biganalytics, bigtabulate packages do.

> I seem to recall bigmemory came up
> before in this context and there was some problem.

Well -- I don't often see emails on this list complaining about their
functionality. That doesn't mean they're flawless (I also don't
scrutinize the list traffic too closely). It could be that not too
many people use them, or that people give up before they come knocking
when there is a problem.

Has something specifically failed for you in the past, or?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list