[R] Handling large dataset & dataframe

Liaw, Andy andy_liaw at merck.com
Mon Apr 24 21:07:22 CEST 2006


Instead of reading the entire data in at once, you read a chunk at a time,
and compute X'X and X'y on that chunk, and accumulate (i.e., add) them.
There are examples in "S Programming", taken from independent replies by the
two authors to a post on S-news, if I remember correctly.

Andy

From: Sachin J
> 
> Gabor:
>    
>   Can you elaborate more.
>    
>   Thanx
>   Sachin
> 
> Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
>   You just need the much smaller cross product matrix X'X and 
> vector X'Y so you can build those up as you read the data in 
> in chunks.
> 
> 
> On 4/24/06, Sachin J wrote:
> > Hi,
> >
> > I have a dataset consisting of 350,000 rows and 266 columns. Out of 
> > 266 columns 250 are dummy variable columns. I am trying to 
> read this 
> > data set into R dataframe object but unable to do it due to memory 
> > size limitations (object size created is too large to 
> handle in R). Is 
> > there a way to handle such a large dataset in R.
> >
> > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
> >
> > Any pointers would be of great help.
> >
> > TIA
> > Sachin
> >
> >
> > ---------------------------------
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> >
> 
> 
> 		
> ---------------------------------
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list