[R] Using cumsum with 'group by' ?

Pete Brecknock Peter.Brecknock at bp.com
Fri Nov 23 16:15:39 CET 2012


TheRealJimShady wrote
> Hi Peter,
> 
> Yes, I did miss an e from the first 'not' in the brackets at the end
> of the message, sorry.
> 
> Thanks for that code, but when I use it, it creates a new column
> called csum which simply contains the values of the variable x . i.e.
> it just duplicates the values from x into the new column.  I guess
> this means that the grouping isn't occurring correctly?
> 
> James
> 
> On 23 November 2012 13:57, Peter Dalgaard-2 [via R]
> <

> ml-node+s789695n4650550h76 at .nabble

> > wrote:
>>
>> On Nov 23, 2012, at 12:04 , TheRealJimShady wrote:
>>
>>> Hi Arun & everyone,
>>>
>>> Thank you very much for your helpful suggestions. I've been working
>>> through them, but have realised that my data is a little more
>>> complicated than I said and that the solutions you've kindly provided
>>> don't work. The problem is that there is more than one day of data for
>>> each person. It looks like this:
>>>
>>> id          x          date
>>> 1          5          2012-06-05 12:01
>>> 1          10        2012-06-05 12:02
>>> 1          45        2012-06-05 12:03
>>> 2          5          2012-06-05 12:01
>>> 2          3          2012-06-05 12:03
>>> 2          2          2012-06-05 12:05
>>> 3          5          2012-06-05 12:03
>>> 3          5          2012-06-05 12:04
>>> 3          8          2012-06-05 12:05
>>> 1          5          2012-06-08 13:01
>>> 1          9          2012-06-08 13:02
>>> 1          3          2012-06-08 13:03
>>> 2          0          2012-06-08 13:15
>>> 2          1          2012-06-08 13:18
>>> 2          8          2012-06-08 13:20
>>> 2          4          2012-06-08 13:21
>>> 3          6          2012-06-08 13:15
>>> 3          2          2012-06-08 13:16
>>> 3          7          2012-06-08 13:17
>>> 3          2          2012-06-08 13:18
>>>
>>> So what I need to do is something like this (in pseudo code anyway):
>>>
>>> - Order the data by the id field and then the date field
>>> - add a new variable called cumsum
>>> - calculate this variable as the cumulative value of X, but grouping
>>> by the id and date (not date, not date and time).
>>
>> Did you miss an 'e' on the first 'not'?? Otherwise, I'm confused.
>>
>> Why do people always forget about ave()? I'd try
>>
>> newdata <- transform(mydata, csum=ave(x, id, as.Date(date), FUN=cumsum))
>>
>> You still need to sort, of course, at least by date (incl. time). Check
>> carefully whether time zone is an issue for as.Date --- you may need the
>> tz
>> argument.
>>
>> Also notice that I shy using an R function name as a variable name. This
>> is
>> mostly superstition these days, but better safe than sorry.
>>
>>
>>>
>>> Thank you
>>>
>>> James
>>>
>>>
>>>
>>>
>>>
>>> On 23 November 2012 03:54, arun kirshna [via R]
>>> <[hidden email]> wrote:
>>>> Hi,
>>>> No problem.
>>>> One more method if you wanted to try:
>>>> library(data.table)
>>>> dat2<-data.table(dat1)
>>>> dat2[,list(x,time,Cumsum=cumsum(x)),list(id)]
>>>> #   id  x  time Cumsum
>>>> #1:  1  5 12:01      5
>>>> #2:  1 14 12:02     19
>>>> #3:  1  6 12:03     25
>>>> #4:  1  3 12:04     28
>>>> #5:  2 98 12:01     98
>>>> #6:  2 23 12:02    121
>>>> #7:  2  1 12:03    122
>>>> #8:  2  4 12:04    126
>>>> #9:  3  5 12:01      5
>>>> #10:  3 65 12:02     70
>>>> #11:  3 23 12:03     93
>>>> #12:  3 23 12:04    116
>>>>
>>>>
>>>> A.K.
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: TheRealJimShady <[hidden email]>
>>>> To: [hidden email]
>>>> Cc:
>>>> Sent: Thursday, November 22, 2012 12:27 PM
>>>> Subject: Re: [R] Using cumsum with 'group by' ?
>>>>
>>>> Thank you very much, I will try these tomorrow morning.
>>>>
>>>> On 22 November 2012 17:25, arun kirshna [via R]
>>>> <[hidden email]> wrote:
>>>>
>>>>> HI,
>>>>> You can do this in many ways:
>>>>> dat1<-read.table(text="
>>>>> id    time    x
>>>>> 1   12:01    5
>>>>> 1   12:02   14
>>>>> 1   12:03   6
>>>>> 1   12:04   3
>>>>> 2   12:01   98
>>>>> 2   12:02   23
>>>>> 2   12:03   1
>>>>> 2   12:04   4
>>>>> 3   12:01   5
>>>>> 3   12:02   65
>>>>> 3   12:03   23
>>>>> 3   12:04   23
>>>>> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>> dat1$Cumsum<-ave(dat1$x,dat1$id,FUN=cumsum)
>>>>> #or
>>>>> unlist(tapply(dat1$x,dat1$id,FUN=cumsum),use.names=FALSE)
>>>>> # [1]   5  19  25  28  98 121 122 126   5  70  93 116
>>>>> #or
>>>>> library(plyr)
>>>>> ddply(dat1,.(id),function(x) cumsum(x[3]))[,2]
>>>>> # [1]   5  19  25  28  98 121 122 126   5  70  93 116
>>>>> head(dat1)
>>>>> #  id  time  x Cumsum
>>>>> #1  1 12:01  5      5
>>>>> #2  1 12:02 14     19
>>>>> #3  1 12:03  6     25
>>>>> #4  1 12:04  3     28
>>>>> #5  2 12:01 98     98
>>>>> #6  2 12:02 23    121
>>>>> A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> If you reply to this email, your message will be added to the
>>>>> discussion
>>>>> below:
>>>>>
>>>>>
>>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650459.html
>>>>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>>>>> NAML
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650461.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>    [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> ________________________________
>>>> If you reply to this email, your message will be added to the
>>>> discussion
>>>> below:
>>>>
>>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650505.html
>>>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>>>> NAML
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650538.html
>>
>>> Sent from the R help mailing list archive at Nabble.com.
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: [hidden email]  Priv: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650550.html
>> To unsubscribe from Using cumsum with 'group by' ?, click here.
>> NAML

Peter Dalgaard's suggestion works for me ...

lines<-"id x date time
1 5 2012-06-05 12:01 
1 10 2012-06-05 12:02 
1 45 2012-06-05 12:03 
2 5 2012-06-05 12:01 
2 3 2012-06-05 12:03 
2 2 2012-06-05 12:05 
3 5 2012-06-05 12:03 
3 5 2012-06-05 12:04 
3 8 2012-06-05 12:05 
1 5 2012-06-08 13:01 
1 9 2012-06-08 13:02 
1 3 2012-06-08 13:03 
2 0 2012-06-08 13:15 
2 1 2012-06-08 13:18 
2 8 2012-06-08 13:20 
2 4 2012-06-08 13:21 
3 6 2012-06-08 13:15 
3 2 2012-06-08 13:16 
3 7 2012-06-08 13:17 
3 2 2012-06-08 13:18 
"

# read in data
dat1<-read.table(textConnection(lines), header=TRUE,stringsAsFactors=FALSE) 

# build csum variable
newdata <- transform(dat1, csum=ave(x, id, as.Date(date), FUN=cumsum)) 

# order data (not really necessary)
newdata.ord <-newdata[order(dat1[,"id"],dat1[,"date"],dat1[,"time"]),]

Or have I misinterpreted your request?

HTH

Pete (B not D)



--
View this message in context: http://r.789695.n4.nabble.com/Using-cumsum-with-group-by-tp4650457p4650556.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list