[R] Processing large datasets

Mike Marchywka marchywka at hotmail.com
Wed May 25 21:24:01 CEST 2011









----------------------------------------
> Date: Wed, 25 May 2011 12:32:37 -0400
> Subject: Re: [R] Processing large datasets
> From: mailinglist.honeypot at gmail.com
> To: marchywka at hotmail.com
> CC: roman at bestroman.com; r-help at r-project.org
>
> Hi,
>
> On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka  wrote:
> [snip]
> >> > If your datasets are *really* huge, check out some packages listed
> >> > under the "Large memory and out-of-memory data" section of the
> >> > "HighPerformanceComputing" task view at CRAN:
> >>
> >> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
> >
> > Does this have any specific limitations ? It sounds offhand like it
> > does paging and all the needed buffering for arbitrary size
> > data. Does it work with everything?
>
> I'm not sure what limitations ... I know the bigmemory (and ff)
> packages try hard to make using out-of-memory datasets as
> "transparent" as possible.
>
> That having been said, I guess you will have to port "more advanced"
> methods to use such packages, hence the existence of the biglm,
> biganalytics, bigtabulate packages do.
>
> > I seem to recall bigmemory came up
> > before in this context and there was some problem.
>
> Well -- I don't often see emails on this list complaining about their
> functionality. That doesn't mean they're flawless (I also don't
> scrutinize the list traffic too closely). It could be that not too
> many people use them, or that people give up before they come knocking
> when there is a problem.
>
> Has something specifically failed for you in the past, or?

No, I haven't tried. I may have it confused with something else.
But this question does come up a bit usually related to 
" I tried to read huge file into data frame and wanted to pass
it to something with predictable memory access patterns and it
ran out of memory. What can I do?" I guess I also stopped reading
anything after " using a DB" as this is generally not a replacement
for a data strcuture. I'll take a look when I have a big dataset that
I can't condense easily. 






>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
 		 	   		  


More information about the R-help mailing list