[R] Summing rows based on criteria

PDXRugger J_R_36 at hotmail.com
Mon Nov 2 18:56:07 CET 2009


I am attempting to clean up some land use building data and need to join some
buildings together making sure not to double count GIS slivers.  The first
data.frame is the original, the 2nd adds all the acres for each identical
bldgid.  I now want to 
a) throw out all but one of the the cases where the Years and ImpValue are
Identical, 
b) Sum the impvalues based on: 
                     1) The Years are identical and the ImpValue are not
                     2)The ImpValues are identical and the Years are not

Resulting in the 3rd data frame.

Please consider the following

DF=cbind(Acres,Bldgid,Year,ImpValue)
 DF<-as.data.frame(DF)
 DF
    Acres     Bldgid Year ImpValue
1  100.00      1 1946     1000
2  101.00      2 1952     1400
3  100.00      3 1922     1300
4  130.00      4 1910      900
5  156.00      5 1955     5000
6    0.50      5 1955     1200
7  293.00      6 1999      500
8  300.00      7 1990     9000
9    0.09      7 1991     9000
10 100.00      8 2000     1000
11  12.50      8 2000     1000

#Aggregate acres where identical ids
dupbuild<-aggregate(DF$Acres,DF["Bldgid"],sum)
 colnames(dupbuild)[2]<-"Acres"
 
 #Add aggregated Acres to DF
 DF$Acres<-dupbuild$Acres[match(DF$Bldgid,dupbuild$Bldgid)]
 DF
   Acres     dgid Year ImpValue
1  100.00      1 1946     1000
2  101.00      2 1952     1400
3  100.00      3 1922     1300
4  130.00      4 1910      900
5  156.50      5 1955     5000
6  156.50      5 1955     1200
7  293.00      6 1999      500
8  300.09      7 1990     9000
9  300.09      7 1991     9000
10 112.50      8 2000     1000
11 112.50      8 2000     1000

#desired outcome data frame 
   Acres     dgid Year ImpValue
1  100.00      1 1946     1000
2  101.00      2 1952     1400
3  100.00      3 1922     1300
4  130.00      4 1910      900
5  156.50      5 1955     7200 #combined 5 & 6 
7  293.00      6 1999      500
8  300.09      7 1990     18000 #combined 8 & 9
10 112.50      8 2000     1000 #one case thrown out

So in this case the Impvalue are added together for rows 5 & 6 (from
dataframe example 2) b/c the years are identical and the Impvalue is not,
rows 8 and 9 have their Impvalue summed because there Years are identical
but the improvement value is not, and one of the cases is thrown out of rows
10 & 11 because they have identical years and ImpValue.  

When rows are joined the Year value is no longer important but to remain
consistent i would like to keep the earliest (lowest) year.  There will be
instances in the actual data where there are more than 2 cases to consider
if that makes any coding difference as i didnt include any in my example
data.  It would also be useful to include a new column keeping track of how
many joined bldgids .   I think i can figure that one out though.  

Hope this is all clear.  Thanks for the guidance and insights

JR

-- 
View this message in context: http://old.nabble.com/Summing-rows-based-on-criteria-tp26157755p26157755.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list