[R] Summing identical IDs

David Winsemius dwinsemius at comcast.net
Sun Nov 1 00:04:23 CET 2009


On Oct 30, 2009, at 3:29 PM, PDXRugger wrote:

>
> David,
>  You are correct.  I think the frist two assumptions can be thrown  
> out and
> only the latter two (c,d) can be considered.  So how would i combine  
> Acres
> for matching Bldgids based on assumptions c,d?
>
>
I also  think your requested output failed to add the ImpValue of rows  
10 and 11 (but did add for  cases 5 & 6 as well as 8 & 9. So it's not  
exactly clear what you want.

>
> David Winsemius wrote:
>>
>>
>> On Oct 29, 2009, at 5:23 PM, PDXRugger wrote:
>>
>>>
>>> Terrific help thank you.
>>> dupbuild<-aggregate(DF$Acres, list(Bldgid), sum)
>>> This line worked best.
>>>
>>> Now im going to challenge everyone (i think?)
>>>
>>> Consider the following:
>>>
>>>
>>> Acres<-c(100,101,100,130,156,.5,293,300,.09,100,12.5)
>>> Bldgid<-c(1,2,3,4,5,5,6,7,7,8,8)
>>> Year<-c(1946,1952,1922,1910,1955,1955,1999,1990,1991,2000,2000)
>>> ImpValue<-c(1000,1400,1300,900,5000,1200,500,1000,300,1000,1000)
>>> DF=cbind(Acres,Bldgid,Year,ImpValue)
>>> DF<-as.data.frame(DF)
>>>
>>> I would like to do the same, except there are some rules i want to
>>> follow.
>>> I only want to aggregate the Acres if :
>>> a) The Years are not identical
>>> b) The ImpValues are not identical

So we are striking out a and b and we only want to sum values where  
the following obtain:


>>> c) The Years are identical and the ImpValue are not
>>> d)The ImpValues are identical and the Years are not

Do an outer join on the DF:
mrgDF <- merge(DF, DF, by="Bldgid", all=T)

# and identify cases within that construct that satisfy the non- 
equalities:

 > with( mrgDF, mrgDF[ImpValue.x != ImpValue.y | Acres.y !=  
Acres.x , ] )
    Bldgid Acres.x Year.x ImpValue.x Acres.y Year.y ImpValue.y
6       5  156.00   1955       5000    0.50   1955       1200
7       5    0.50   1955       1200  156.00   1955       5000
11      7  300.00   1990       1000    0.09   1991        300
12      7    0.09   1991        300  300.00   1990       1000
15      8  100.00   2000       1000   12.50   2000       1000
16      8   12.50   2000       1000  100.00   2000       1000

You should be able to take it from here ... after you figure out what  
it is that you want.

-- 
David.

>>
>> As I review your Boolean logic, I run into serious problems.
>>
>> c) and d) cannot be true if a and b) are true.
>>
>> So no cases satisfy all 4 specs. In particular both of the pairs you
>> say you want aggregated (5+6) and 10+11) violate rule a) and the
>> second pair also violates b).
>>
>> -- 
>> David
>>>
>>> but if the Acres and ImpValues are identical i would still like to
>>> add the
>>> Acres together and form one case.
>>> If the cases are put together i would also like to add the ImpValues
>>> together.  So the below
>>>
>>>   Acres Bldgid Year ImpValue
>>> 1  100.00      1 1946     1000
>>> 2  101.00      2 1952     1400
>>> 3  100.00      3 1922     1300
>>> 4  130.00      4 1910      900
>>> 5  156.00      5 1955     5000
>>> 6    0.50      5 1955     1200
>>> 7  293.00      6 1999      500
>>> 8  300.00      7 1990     1000
>>> 9    0.09      7 1991      300
>>> 10 100.00      8 2000     1000
>>> 11  12.50      8 2000     1000
>>>
>>> would become
>>>
>>>   Acres Bldgid Year ImpValue
>>> 1  100.00      1 1946     1000
>>> 2  101.00      2 1952     1400
>>> 3  100.00      3 1922     1300
>>> 4  130.00      4 1910      900
>>> 5  156.50     5 1955     6200
>>> 7  293.00      6 1999      500
>>> 8  300.09     7 1990     1300
>>> 10 112.50      8 2000     1000
>>>
>>> Thanks, i gave it a bunch of shots but nothing worth posting.
>>>
>>>
>>>
>>>
>>>
>>> PDXRugger wrote:
>>>>
>>>> Hello All,
>>>>  I would like to select records with identical IDs then sum an
>>>> attribute
>>>> then and return them to the data frame as a single record.  Please
>>>> consider
>>>>
>>>>
>>>> Acres<-c(100,101,100,130,156,.5,293,300,.09)
>>>> Bldgid<-c(1,2,3,4,5,5,6,7,7)
>>>>
>>>> DF=cbind(Acres,Bldgid)
>>>> DF<-as.data.frame(DF)
>>>>
>>>> So that:
>>>>
>>>> Acres Bldgid
>>>> 1 100.00      1
>>>> 2 101.00      2
>>>> 3 100.00      3
>>>> 4 130.00      4
>>>> 5 156.00      5
>>>> 6   0.50      5
>>>> 7 293.00      6
>>>> 8 300.00      7
>>>> 9   0.09      7
>>>>
>>>> Becomes
>>>>
>>>> Acres Bldgid
>>>> 1 100.00      1
>>>> 2 101.00      2
>>>> 3 100.00      3
>>>> 4 130.00      4
>>>> 5 156.50      5
>>>> 7 293.00      6
>>>> 8 300.09      7
>>>>
>>>> dup<-unique(DF$Bldgid[duplicated(Bldgid)])
>>>> dupbuild<-DF[DF$Bldgid %in% dup,]
>>>> dupbuild..dupareasum<-sum(dupbuild$Acres[duplicated(dupbuild
>>>> $Bldgid)])
>>>>
>>>> This sums the unique Ids of the duplicated records, not whati want.
>>>> Thanks ahead of time
>>>>
>>>> JR
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Summing-identical-IDs-tp26118922p26121056.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> -- 
> View this message in context: http://old.nabble.com/Summing-identical-IDs-tp26118922p26135732.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list