R-beta: read.table and large datasets

Douglas Bates bates at stat.wisc.edu
Mon Mar 9 19:56:02 CET 1998


Rick White <rick at stat.ubc.ca> writes:

> I find that read.table cannot handle large datasets. Suppose data is a
> 40000 x 6 dataset
> 
> R -v 100
> 
> x_read.table("data")  gives
> Error: memory exhausted
> but
> x_as.data.frame(matrix(scan("data"),byrow=T,ncol=6))
> works fine.
> 
> read.table is less typing ,I can include the variable names in the first
> line and in Splus executes faster. Is there a fix for read.table on the
> way?

You probably need to increase -n as well as -v to read in this table.
Try setting 
 gcinfo(TRUE)
to see what is happening with the garbage collector.  Most likely it
is running out of cons cells long before it runs out of heap storage.

The reason I suspect this is because I encountered exactly the same
situation several weeks ago and Thomas Lumley pointed this out to me.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list