[R] converting large dataframes to matrix (was: large dataframes in ASCII)

Ott Toomet siim at obs.ee
Sun Aug 11 17:16:35 CEST 2002


Hi,

True, write.matrix does quite a good job if the data already is in matrix
form.  The problem arises using real data (labour force survey in my case),
which includes variables of different storage mode.  The dataframe I used
contains mostly integers and factors in character form (most of dataframe
contains NA-s, however).

My computer has 128M memory, R (1.5.1) took 52MB when dataframe e2000 was
loaded (7500x1200).  Trying to transform it to a matrix

f2000 <- as.matrix(e2000)

R grew to 155MB after which I killed the process.  So, in this case the
block size does not help much.

Best wishes,

Ott




On Sun, 11 Aug 2002 ripley at stats.ox.ac.uk wrote:

  |The sort of `large' here is 7500x1200.  That's 72Mb if real numbers, so
  |let's assume you have at least 256Mb to use.  I ran the following on
  |Windows with a 256Mb limit (and I had to use R-devel to do so). I actually
  |found it difficult to create a data frame of that size in 256Mb, and
  |resorted to
  |
  |A1 <- vector("list", 1000)
  |for(i in 1:1000) A1[[i]] <- rnorm(8000)
  |class(A1) <- "data.frame"
  |row.names(A1) <- 1:8000
  |
  |which took 15 secs and 140Mb as an underhand way to make a data frame.
  |(1.5.1 took too much memory here.)
  |
  |Then
  |
  |A2 <- as.matrix(A1)
  |
  |took 1.8secs (hardly slow) and an additional 64Mb to hold the object A2.
  |I then deleted A1.  Running
  |
  |write.table(A2, "foo.dat", blocksize=1000)
  |

you mean write.matrix?

  |used about 150Mb in about four minutes.  That is formatting 8 million
  |numbers, and 85% of the time was spent in the system calls, as one should
  |expect.  (I suspect I did not need to delete A1, but didn't want to wait
  |around to find out.)
  |
  |So
  |
  |1) you could have checked your claims by some simple experiments.
  |
  |2) as claimed, write.matrix does indeed do the job.

Agree, given there is sufficent memory and/or the data is of homogeneous
storage mode.



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list