[R] applying cumsum within groups

David Winsemius dwinsemius at comcast.net
Fri Apr 3 18:17:20 CEST 2015


On Apr 3, 2015, at 5:17 AM, Morway, Eric wrote:

> This small example will be applied to a problem with 1.4e6 lines of data.
> First, here is the dataset and a few lines of R script, followed by an
> explanation of what I'd like to get:
> 
> dat <- read.table(textConnection("ISEG  IRCH  val
> 1    1   265
> 1    2   260
> 1    3   234
> 54   39   467
> 54   40   468
> 54   41   460
> 54   42   489
> 1    1   265
> 1    2   276
> 1    3   217
> 54   39   456
> 54   40   507
> 54   41   483
> 54   42   457
> 1    1   265
> 1    2   287
> 1    3   224
> 54   39   473
> 54   40   502
> 54   41   497
> 54   42   447
> 1    1   230
> 1    2   251
> 1    3   199
> 54   39   439
> 54   40   474
> 54   41   477
> 54   42   413
> 1    1   230
> 1    2   262
> 1    3   217
> 54   39   455
> 54   40   493
> 54   41   489
> 54   42   431
> 1    1   1002
> 1    2   1222
> 1    3   1198
> 54   39   1876
> 54   40   1565
> 54   41   1455
> 54   42   1427
> 1    1   1002
> 1    2   1246
> 1    3   1153
> 54   39   1813
> 54   40   1490
> 54   41   1518
> 54   42   1486
> 1    1   1002
> 1    2   1229
> 1    3   1142
> 54   39   1797
> 54   40   1517
> 54   41   1527
> 54   42   1514"),header=TRUE)
> 
> dat$seq <- ifelse(dat$ISEG==1 & dat$IRCH==1, 1, 0)
> tmp <- diff(dat[dat$seq==1,]$val)!=0
> dat$idx <- 0
> dat[dat$seq==1,][c(TRUE,tmp),]$idx <- 1
> dat$ts <- cumsum(dat$idx)
> 
> At this point, I'd like to add one more column called "iter" that counts up
> by 1 based on "seq", but within each "ts".  So, the result would look like
> this (undoubtedly this is a simple problem with something like ddply, but
> I've been unable to construct the R for it):

> dat$iter2 <- ave(dat$seq, dat$ts,FUN=cumsum)
> dat
   ISEG IRCH  val seq idx ts iter iter2
1     1    1  265   1   1  1  1_1     1
2     1    2  260   0   0  1  1_1     1
3     1    3  234   0   0  1  1_1     1
4    54   39  467   0   0  1  1_1     1
5    54   40  468   0   0  1  1_1     1
6    54   41  460   0   0  1  1_1     1
7    54   42  489   0   0  1  1_1     1
8     1    1  265   1   0  1  1_2     2
9     1    2  276   0   0  1  1_2     2
10    1    3  217   0   0  1  1_2     2
11   54   39  456   0   0  1  1_2     2
12   54   40  507   0   0  1  1_2     2
13   54   41  483   0   0  1  1_2     2
14   54   42  457   0   0  1  1_2     2
15    1    1  265   1   0  1  1_3     3
16    1    2  287   0   0  1  1_3     3
17    1    3  224   0   0  1  1_3     3
18   54   39  473   0   0  1  1_3     3
19   54   40  502   0   0  1  1_3     3
20   54   41  497   0   0  1  1_3     3
21   54   42  447   0   0  1  1_3     3
22    1    1  230   1   1  2  2_4     1
23    1    2  251   0   0  2  2_4     1
snipped----->

-- 
David
> 
> dat
> ISEG IRCH  val seq idx ts iter
>    1    1  265   1   1  1    1
>    1    2  260   0   0  1    1
>    1    3  234   0   0  1    1
>   54   39  467   0   0  1    1
>   54   40  468   0   0  1    1
>   54   41  460   0   0  1    1
>   54   42  489   0   0  1    1
>    1    1  265   1   0  1    2
>    1    2  276   0   0  1    2
>    1    3  217   0   0  1    2
>   54   39  456   0   0  1    2
>   54   40  507   0   0  1    2
>   54   41  483   0   0  1    2
>   54   42  457   0   0  1    2
>    1    1  265   1   0  1    3
>    1    2  287   0   0  1    3
>    1    3  224   0   0  1    3
>   54   39  473   0   0  1    3
>   54   40  502   0   0  1    3
>   54   41  497   0   0  1    3
>   54   42  447   0   0  1    3
>    1    1  230   1   1  2    1
>    1    2  251   0   0  2    1
>    1    3  199   0   0  2    1
>   54   39  439   0   0  2    1
>   54   40  474   0   0  2    1
>   54   41  477   0   0  2    1
>   54   42  413   0   0  2    1
>    1    1  230   1   0  2    2
>    1    2  262   0   0  2    2
>    1    3  217   0   0  2    2
>   54   39  455   0   0  2    2
>   54   40  493   0   0  2    2
>   54   41  489   0   0  2    2
>   54   42  431   0   0  2    2
>    1    1 1002   1   1  3    1
>    1    2 1222   0   0  3    1
>    1    3 1198   0   0  3    1
>   54   39 1876   0   0  3    1
>   54   40 1565   0   0  3    1
>   54   41 1455   0   0  3    1
>   54   42 1427   0   0  3    1
>    1    1 1002   1   0  3    2
>    1    2 1246   0   0  3    2
>    1    3 1153   0   0  3    2
>   54   39 1813   0   0  3    2
>   54   40 1490   0   0  3    2
>   54   41 1518   0   0  3    2
>   54   42 1486   0   0  3    2
>    1    1 1002   1   0  3    3
>    1    2 1229   0   0  3    3
>    1    3 1142   0   0  3    3
>   54   39 1797   0   0  3    3
>   54   40 1517   0   0  3    3
>   54   41 1527   0   0  3    3
>   54   42 1514   0   0  3    3
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list