[R] reshape to wide format takes extremely long

Coen van Hasselt coenvanhasselt at gmail.com
Thu Sep 2 09:24:21 CEST 2010


Hello,

I have a data.frame with the following format:

> head(clin2)
    Study Subject  Type      Obs Cycle Day       Date  Time
1 A001101   10108   ALB 44.00000    98   1 2004-03-11 14:26
2 A001101   10108   ALP 95.00000    98   1 2004-03-11 14:26
3 A001101   10108   ALT 61.00000    98   1 2004-03-11 14:26
5 A001101   10108   AST 33.00000    98   1 2004-03-11 14:26

I want to transform this data.frame so that I have "Obs" columns for
each "Type". The full dataset is 45000 rows long. For a short subset
of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
All columns are either numeric or character format (incl. date/time).

> reshape(clin2, v.names="Obs", timevar="Type", direction="wide",idvar=c("Study","Subject","Cycle","Day","Date","Time"),)
      Study Subject Cycle Day       Date  Time Obs.ALB Obs.ALP Obs.ALT Obs.AST
1   A001101   10108    98   1 2004-03-11 14:26      44      95      61      33
11  A001101   10108     1   1 2004-03-12 14:01      41      85      39      33
21  A001101   10108     1   8 2004-03-22 10:34      40      90      70      34
30  A001101   10108     1  15 2004-03-29 09:56      45      97      66
     48 [........]

However, when using the same reshape command for the full data.frame
of 45000 rows, it still wasn't finished when run overnight (8 GB RAM +
8 GB swap in use).

The time to process this data.frame from a 100-row subset to a
1000-row subset increases from 0.2 sec to 60 sec.

I would greatly appreciate a advice why the time for reshaping is
increasing exponentially with the nr. of rows, and how I can do this
in an elegant way.

Thanks!

Coen.



More information about the R-help mailing list