[R] Handling large data sets via scan()

Christoph Lehmann christoph.lehmann at gmx.ch
Fri Feb 4 10:28:34 CET 2005


does it solve to a part your problem, if you use read.table() instead of 
scan, since it imports data directly to a data.frame?

let me know, if it helps

Nawaaz Ahmed wrote:
> I'm trying to read in datasets with roughly 150,000 rows and 600
> features. I wrote a function using scan() to read it in (I have a 4GB
> linux machine) and it works like a charm.  Unfortunately, converting the
> scanned list into a datafame using as.data.frame() causes the memory
> usage to explode (it can go from 300MB for the scanned list to 1.4GB for
> a data.frame of 30000 rows) and it fails claiming it cannot allocate
> memory (though it is still not close to the 3GB limit per process on my
> linux box - the message is "unable to allocate vector of size 522K"). 
> 
> So I have three questions --
> 
> 1) Why is it failing even though there seems to be enough memory available?
> 
> 2) Why is converting it into a data.frame causing the memory usage to
> explode? Am I using as.data.frame() wrongly? Should I be using some
> other command?
> 
> 3) All the model fitting packages seem to want to use data.frames as
> their input. If I cannot convert my list into a data.frame what can I
> do? Is there any way of getting around this?
> 
> Much thanks!
> Nawaaz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list