[R] comparing reshape's

ivo welch ivowel at gmail.com
Fri Jun 11 22:36:01 CEST 2010


I thought I would share the following.

System: Mac Pro 2.26GHz, OSX, 8GB of memory (not a constraint), R
2.11.0, 64bit version.

Task:  I have a long data set: 2.2 million long observations (factor
xid, factor yid, variable zcontent), which I want to map into a sparse
matrix of 948 columns and 16,350 rows.  There are two commonly used
functions to accomplish this:

   library(stats);
   outcome = reshape( subset(mydataframe, select=c(yid,xid,zcontent),
timevar="yid", idvar="xid", direction="wide") )

takes about 9,600 seconds .

   library(reshape)
   melted = melt( subset(mydataframe, select=c(yid,xid,zcontent),
id=c("xid", "yid") )
   outcome = cast( zcontent, xid ~ yid )

takes about 875 seconds.


so, for large reshape jobs from long to wide, the reshape library is
much more efficient.  YMMV.

/iaw

----
Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)



More information about the R-help mailing list