[R] Help with ddply to eliminate a for..loop

Marc Schwartz marc_schwartz at me.com
Thu Aug 26 22:40:43 CEST 2010


On Aug 26, 2010, at 3:33 PM, Bos, Roger wrote:

> I created a small example to show something that I do a lot of.  "scale"
> data by month and return a data.frame with the output.  "id" represents
> repeated observations over "time" and I want to scale the "slope"
> variable.  The "out" variable shows the output I want.  My for..loop
> does the job but is probably very slow versus other methods.  ddply
> seems ideal, but despite playing with the baseball examples quite a bit
> I can't figure out how to get it to work with my sample dataset.  
> 
> TIA for any help, Roger
> 
> Here is the sample code:
> 
> dat <- data.frame(id=rep(letters[1:5],3),
> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
> dat
> 
> for (i in 1:3) {
>    mat <- dat[dat$time==i, ]
>    outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
>    if (i==1) {
>        out <- outi
>    } else {
>        out <- rbind(out, outi)
>    }
> }
> out
> 
> Here is the sample output:
> 
>> dat <- data.frame(id=rep(letters[1:5],3),
> time=c(rep(1,5),rep(2,5),rep(3,5)), slope=1:15)
> 
>> dat
>   id time slope
> 1   a    1     1
> 2   b    1     2
> 3   c    1     3
> 4   d    1     4
> 5   e    1     5
> 6   a    2     6
> 7   b    2     7
> 8   c    2     8
> 9   d    2     9
> 10  e    2    10
> 11  a    3    11
> 12  b    3    12
> 13  c    3    13
> 14  d    3    14
> 15  e    3    15
> 
>> for (i in 1:3) {
> +     mat <- dat[dat$time==i, ]
> +     outi <- data.frame(mat$time, mat$id, slope=scale(mat$slope))
> +     if (i==1) {
> +         out  .... [TRUNCATED] 
> 
>> out
>   mat.time mat.id      slope
> 1         1      a -1.2649111
> 2         1      b -0.6324555
> 3         1      c  0.0000000
> 4         1      d  0.6324555
> 5         1      e  1.2649111
> 6         2      a -1.2649111
> 7         2      b -0.6324555
> 8         2      c  0.0000000
> 9         2      d  0.6324555
> 10        2      e  1.2649111
> 11        3      a -1.2649111
> 12        3      b -0.6324555
> 13        3      c  0.0000000
> 14        3      d  0.6324555
> 15        3      e  1.2649111
>> 
> ***************************************************************


Roger, seems like you might want:

See ?ave

> cbind(dat, slope = ave(dat$slope, list(dat$time), FUN = scale))
   id time slope      slope
1   a    1     1 -1.2649111
2   b    1     2 -0.6324555
3   c    1     3  0.0000000
4   d    1     4  0.6324555
5   e    1     5  1.2649111
6   a    2     6 -1.2649111
7   b    2     7 -0.6324555
8   c    2     8  0.0000000
9   d    2     9  0.6324555
10  e    2    10  1.2649111
11  a    3    11 -1.2649111
12  b    3    12 -0.6324555
13  c    3    13  0.0000000
14  d    3    14  0.6324555
15  e    3    15  1.2649111


HTH,

Marc Schwartz



More information about the R-help mailing list