[R] comparing reshape's
ivowel at gmail.com
Fri Jun 11 22:36:01 CEST 2010
I thought I would share the following.
System: Mac Pro 2.26GHz, OSX, 8GB of memory (not a constraint), R
2.11.0, 64bit version.
Task: I have a long data set: 2.2 million long observations (factor
xid, factor yid, variable zcontent), which I want to map into a sparse
matrix of 948 columns and 16,350 rows. There are two commonly used
functions to accomplish this:
outcome = reshape( subset(mydataframe, select=c(yid,xid,zcontent),
timevar="yid", idvar="xid", direction="wide") )
takes about 9,600 seconds .
melted = melt( subset(mydataframe, select=c(yid,xid,zcontent),
id=c("xid", "yid") )
outcome = cast( zcontent, xid ~ yid )
takes about 875 seconds.
so, for large reshape jobs from long to wide, the reshape library is
much more efficient. YMMV.
Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
More information about the R-help