[R] Slow reshape from 5x600000 to 6311 x 132

Christopher Austin-Lane lanstin at aol.net
Fri Mar 5 05:31:02 CET 2004


I have a dataset that's a few hundred thousand rows from a database 
(read in via dbreadTable).  The database is like:

 > str(measures)
`data.frame':   609363 obs. of  5 variables:
  $ vih.id   : int  1 2 3 4 5 6 7 8 9 10 ... 

  $ vi.id    : int  1 2 3 4 5 6 7 8 9 10 ... 

  $ vih.value: chr  "0" "1989" "0" "N/A" ... 

  $ vih.date : chr  "20040226012314" "20040226012315" "20040226012315" 
"20040226012315" ... 

  $ vih.run.n: int  1 1 1 1 1 1 1 1 1 1 ..
I'm reshaping it to be like

 > str(better)
`data.frame':   132 obs. of  6311 variables:
  $ vih.run.n     : int  1 2 4 5 6 7 8 9 10 11 ...
  $ vih.value.1   : chr  "0" "0" "0" "0" ...
  $ vih.value.2   : chr  "1989" "1989" "1989" "1989" ...
  $ vih.value.3   : chr  "0" "0" "0" "0" ...
  $ vih.value.4   : chr  "N/A" "N/A" "N/A" "N/A" ...
  $ vih.value.5   : chr  "3163979" "3163979" "3163979" "3163979" ...
  $ vih.value.6   : chr  "5500073" "5500073" "5500073" "5500073" ...

(etc., etc.)

This takes about 4-8 hours to accomplish.  Should I

a) try to put it into the wide format row by row as I get the data from 
the DB instead of using dbReadTable,

or

b) try to tune something in R?  (I'm trying it now with  R 
--min-vsize=600M --min-nsize=6M although it's not seeming fast; I won't 
know if it's faster for a while).

(Using home compiled R 1.8.1 on Mac OS X 10.3.2, under emacs/ESS, 
although my R 1.8.1 on Solaris 2.8 has been churning for a few hours as 
well (on a split of the data that is 630 variables by 1000 obs).

--Chris




More information about the R-help mailing list