[R] Conditional sum

mathijsdevaan mathijsdevaan at gmail.com
Mon Feb 21 16:54:28 CET 2011


I am still struggling (I'm an R novice). Basically I just want to sum the
values per group if the year condition is met. I have the feeling that using
a loop would work, but I am not really familiar with loops. Something like
this?

for(DF$C in 1:length(DF$C))
	{
	DF<-which(DF$year<DF[i,"year"])
	DF$D<-ave(DF$C,DF$group,FUN = function(x) sum(x))
	}

It doesn't work and probably looks awful, so can someone point me in the
right direction? Thanks!

M


mathijsdevaan wrote:
> 
> 
> Dieter Menne wrote:
>> 
>> In steps following the "thinking order". You could shorten this
>> considerably. I slightly changed you column names to more speakable ones.
>> 
>> Dieter
>> 
>> 
>> DF = data.frame(read.table(textConnection("    group  year  C
>> 1 b1  1999  0.25
>> 2 c1  1999  0.25
>> 3 d1  1999  0.25
>> 4 a2  1999  0.25
>> 5 c2  1999  0.25
>> 6 d2  1999  0.25
>> 7 a3  1999  0.25
>> 8 b3  1999  0.25
>> 9 d3  1999  0.25
>> 10 a4  1999  0.25
>> 11 b4  1999  0.25
>> 12 c4  1999  0.25
>> 13 b1  2001  0.5
>> 14 a2  2001  0.5
>> 15 b1  2004  0.33
>> 16 c1  2004  0.33
>> 17 a2  2004  0.33
>> 18 c2  2004  0.33
>> 19 a3  2004  0.33
>> 20 b3  2004  0.33
>> 21 d2  1980  0.4
>> 22 a3  1980  0.4
>> 23 b4  1981  0.4
>> 24 c1  1981  0.4"),head=TRUE))
>> 
>> by(DF,DF$group, FUN = function(x){
>>   print(str(x))
>> })
>> # Looks like we should order...
>> # Other solutions are possible, but ordering all first might (not tested)
>> # be the most efficient way for large sets
>> DF = DF[order(DF$group,DF$year),]
>> # Let's try cumsum on each group
>> by(DF,DF$group, FUN = function(x){
>>   cumsum(x$C)
>> })
>> # That's not exactly your defininition of "prior"
>> # correct for first value
>> by(DF,DF$group, FUN = function(x){
>>   cumsum(x$C)-x$C
>> })
>> # Now the data are  in right order, make vector of result
>> DF$D = unlist(by(DF,DF$group, FUN = function(x){
>>   cumsum(x$C)
>> }))
>> # You could sort by row names now to restore the old order
>> 
>> 
> 
> Thanks for the quick response, but it doesn't do the trick. There are two
> problems:
> 1. The ith value of the newly created variable DF$D also includes the ith
> value of DF$C (this problem is easily solved by DF$D = DF$D-DF$C.)
> 2. If group i in DF$group appears more than once in year t, the value of
> the second observation of that group exceeds (includes) the value of the
> first observation. Example (group b1 and a2 in 2001 are duplicated):
> 
> DF = data.frame(read.table(textConnection("    group  year  C 
> 1 b1  1999  0.25 
> 2 c1  1999  0.25 
> 3 d1  1999  0.25 
> 4 a2  1999  0.25 
> 5 c2  1999  0.25 
> 6 d2  1999  0.25 
> 7 a3  1999  0.25 
> 8 b3  1999  0.25 
> 9 d3  1999  0.25 
> 10 a4  1999  0.25 
> 11 b4  1999  0.25 
> 12 c4  1999  0.25 
> 13 b1  2001  0.5 
> 14 a2  2001  0.5 
> 15 b1  2004  0.33 
> 16 c1  2004  0.33 
> 17 a2  2004  0.33 
> 18 c2  2004  0.33 
> 19 a3  2004  0.33 
> 20 b3  2004  0.33 
> 21 d2  1980  0.4 
> 22 a3  1980  0.4 
> 23 b4  1981  0.4 
> 24 c1  1981  0.4
> 25 b1  2001  0.5 
> 26 a2  2001  0.5"),head=TRUE)) 
> 
> by(DF,DF$group, FUN = function(x){print(str(x))}) 
> 
> DF = DF[order(DF$group,DF$year),] 
> 
> by(DF,DF$group, FUN = function(x){cumsum(x$C)}) 
> 
> by(DF,DF$group, FUN = function(x){cumsum(x$C)-x$C}) 
> 
> DF$D = unlist(by(DF,DF$group, FUN = function(x){cumsum(x$C)}))
> 
> DF$D = DF$D-DF$C
> 

-- 
View this message in context: http://r.789695.n4.nabble.com/Conditional-sum-tp3315163p3317573.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list