[R] Using cumsum with 'group by' ?

arun smartpink111 at yahoo.com
Fri Nov 23 15:59:57 CET 2012


HI,

If that is the case, this should work:
dat1<-read.table(text="
id,          x,          date
1,          5,          2012-06-05 12:01
1,          10,        2012-06-05 12:02
1,          45,        2012-06-05 12:03
2,          5,          2012-06-05 12:01
2,          3,          2012-06-05 12:03
2,          2,          2012-06-05 12:05
3,          5,          2012-06-05 12:03
3,          5,          2012-06-05 12:04
3,          8,          2012-06-05 12:05
1,          5,          2012-06-08 13:01
1,          9,          2012-06-08 13:02
1,          3,          2012-06-08 13:03
2,          0,          2012-06-08 13:15
2,          1,          2012-06-08 13:18
2,          8,          2012-06-08 13:20
2,          4,          2012-06-08 13:21
3,          6,          2012-06-08 13:15
3,          2,          2012-06-08 13:16
3,          7,          2012-06-08 13:17
3,          2,          2012-06-08 13:18
",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat1$date<-as.Date(dat1$date,format="%Y-%m-%d %H:%M")
 dat2<-dat1[order(dat1[,1],dat1[,3]),]
 dat2$Cumsum<-ave(dat2$x,list(dat2$id,dat2$date),FUN=cumsum)

head(dat2)
#   id  x       date Cumsum
#1   1  5 2012-06-05      5
#2   1 10 2012-06-05     15
#3   1 45 2012-06-05     60
#10  1  5 2012-06-08      5
#11  1  9 2012-06-08     14
#12  1  3 2012-06-08     17
#or
with(dat2,aggregate(x,by=list(id=id,date=date),cumsum))
#  id       date            x
#1  1 2012-06-05    5, 15, 60
#2  2 2012-06-05     5, 8, 10
#3  3 2012-06-05    5, 10, 18
#4  1 2012-06-08    5, 14, 17
#5  2 2012-06-08  0, 1, 9, 13
#6  3 2012-06-08 6, 8, 15, 17
A.K.



----- Original Message -----
From: TheRealJimShady <james.david.smith at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, November 23, 2012 6:04 AM
Subject: Re: [R] Using cumsum with 'group by' ?

Hi Arun & everyone,

Thank you very much for your helpful suggestions. I've been working
through them, but have realised that my data is a little more
complicated than I said and that the solutions you've kindly provided
don't work. The problem is that there is more than one day of data for
each person. It looks like this:

id          x          date
1          5          2012-06-05 12:01
1          10        2012-06-05 12:02
1          45        2012-06-05 12:03
2          5          2012-06-05 12:01
2          3          2012-06-05 12:03
2          2          2012-06-05 12:05
3          5          2012-06-05 12:03
3          5          2012-06-05 12:04
3          8          2012-06-05 12:05
1          5          2012-06-08 13:01
1          9          2012-06-08 13:02
1          3          2012-06-08 13:03
2          0          2012-06-08 13:15
2          1          2012-06-08 13:18
2          8          2012-06-08 13:20
2          4          2012-06-08 13:21
3          6          2012-06-08 13:15
3          2          2012-06-08 13:16
3          7          2012-06-08 13:17
3          2          2012-06-08 13:18

So what I need to do is something like this (in pseudo code anyway):

- Order the data by the id field and then the date field
- add a new variable called cumsum
- calculate this variable as the cumulative value of X, but grouping
by the id and date (not date, not date and time).

Thank you

James





On 23 November 2012 03:54, arun kirshna [via R]
<ml-node+s789695n4650505h81 at n4.nabble.com> wrote:
> Hi,
> No problem.
> One more method if you wanted to try:
> library(data.table)
> dat2<-data.table(dat1)
> dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
>  #   id  x  time Cumsum
>  #1:  1  5 12:01      5
>  #2:  1 14 12:02     19
>  #3:  1  6 12:03     25
>  #4:  1  3 12:04     28
>  #5:  2 98 12:01     98
>  #6:  2 23 12:02    121
>  #7:  2  1 12:03    122
>  #8:  2  4 12:04    126
>  #9:  3  5 12:01      5
> #10:  3 65 12:02     70
> #11:  3 23 12:03     93
> #12:  3 23 12:04    116
>
>
> A.K.
>
>
>
> ----- Original Message -----
> From: TheRealJimShady <[hidden email]>
> To: [hidden email]
> Cc:
> Sent: Thursday, November 22, 2012 12:27 PM
> Subject: Re: [R] Using cumsum with 'group by' ?
>
> Thank you very much, I will try these tomorrow morning.
>
> On 22 November 2012 17:25, arun kirshna [via R]
> <[hidden email]> wrote:
>
>> HI,
>> You can do this in many ways:
>> dat1<-read.table(text="
>> id    time    x
>> 1   12:01    5
>> 1   12:02   14
>> 1   12:03   6
>> 1   12:04   3
>> 2   12:01   98
>> 2   12:02   23
>> 2   12:03   1
>> 2   12:04   4
>> 3   12:01   5
>> 3   12:02   65
>> 3   12:03   23
>> 3   12:04   23
>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>  dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
>> #or
>>  unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
>> # [1]   5  19  25  28  98 121 122 126   5  70  93 116
>> #or
>> library(plyr)
>>  ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
>> # [1]   5  19  25  28  98 121 122 126   5  70  93 116
>> head(dat1)
>> #  id  time  x Cumsum
>> #1  1 12:01  5      5
>> #2  1 12:02 14     19
>> #3  1 12:03  6     25
>> #4  1 12:04  3     28
>> #5  2 12:01 98     98
>> #6  2 12:02 23    121
>> A.K.
>>
>>
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>> NAML
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
> Sent from the R help mailing list archive at Nabble.com.
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
> To unsubscribe from Using cumsum with 'group by' ?, click here.
> NAML




--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list