[R] aggregate and sum on some columns fromduplicate rows

arun smartpink111 at yahoo.com
Sat Mar 1 00:24:04 CET 2014


If, 'dat' is the dataset:
res <- as.data.frame(do.call(rbind,lapply(split(dat,list(dat$St.Sam,dat$Sp),drop=TRUE),function(x) c(unlist(x[1,1:4]),colSums(x[,5:6])))))
 row.names(res) <- 1:nrow(res)
res
   St.Sam  Sp Var1 Var2 NT PT
1  ST1.S1 Sp1   12   aa 23 37
2  ST1.S2 Sp1   25   dd 29 66
3  ST2.S1 Sp1   15   aa 30 45
4  ST3.S1 Sp1   25   aa 26 69
5  ST1.S1 Sp2   32   bb 45 26
6  ST1.S2 Sp2   59   ee 89 35
7  ST2.S1 Sp2   45   cc 55 23
8  ST3.S1 Sp2   36   bb 65 48
9  ST1.S1 Sp3   47   cc 89 35
10 ST2.S1 Sp3   27   aa 85 12
11 ST2.S1 Sp4   42   cc  8  3
A.K.




On Friday, February 28, 2014 5:31 PM, ltdm <Luis.Tito-de-Morais at ird.fr> wrote:
Hi list,

I have a dataframe df looking like this one:

St.Sam    Sp    Var1  Var2    NT    PT
ST1.S1    Sp1    12    aa    20    32
ST1.S1    Sp2    32    bb    45    26
ST1.S1    Sp1    12    aa     3     5
ST1.S1    Sp3    47    cc    89    35
ST1.S2    Sp1    25    dd    29    66
ST1.S2    Sp2    59    ee    89    35
ST2.S1    Sp1    15    aa    30    45
ST2.S1    Sp2    45    cc    55    23
ST2.S1    Sp3    27    aa    85    12
ST2.S1    Sp4    42    cc      8     3
ST3.S1    Sp1    25    aa    26    69
ST3.S1    Sp2    36    bb    56    36
ST3.S1    Sp2    36    bb     9     12

St.Sam : indicates the Station and the sample #
Sp is a species code
Var1 is numeric, Var2 is a factor
NT and PT are numeric
Please note that lines 2 and 3 are the same species (Sp1) in the same
station and sample. Var1 and Var2 have the same values and NT and PT differ.
Also the last 2 lines  are the same species in the same station and sample.
Var1 and Var2 have the same values, and NT and PT differ.

What I would like to have is a dataframe like this one:

St.Sam    Sp    Var1  Var2    NT    PT
ST1.S1    Sp1    12    aa    23    37
ST1.S1    Sp2    32    bb    45    26
ST1.S1    Sp3    47    cc    89    35
ST1.S2    Sp1    25    dd    29    66
ST1.S2    Sp2    59    ee    89    35
ST2.S1    Sp1    15    aa    30    45
ST2.S1    Sp2    45    cc    55    23
ST2.S1    Sp3    27    aa    85    12
ST2.S1    Sp4    42    cc      8      3
ST3.S1    Sp1    25    aa    26    69
ST3.S1    Sp2    36    bb    65    48

i.e. the lines with same pair of St.Sam/Sp have been aggregated and the
values NT and PT summed.
Var1 and Var2 are untouched (they are allways equal for a given pair
St.Sam/Species).

I know how to find and extract the lines of interest using "duplicated", but
I am unable to automatically sum the NT and PT. I have been struggling with
"aggregate" and the like but I am stuck.
I searched the web and the archives but found no answer nor similar
question.

Does anybody have a solution or a link to a page or suggestion to help me ?

Thank you very much for your help

Tito

Here is a quick&dirty way to build the toy dataframe above:
St.Sam <-
c("ST1.S1","ST1.S1","ST1.S1","ST1.S1","ST1.S2","ST1.S2","ST2.S1","ST2.S1","ST2.S1","ST2.S1","ST3.S1","ST3.S1","ST3.S1")
Species <-
c("Sp1","Sp2","Sp1","Sp3","Sp1","Sp2","Sp1","Sp2","Sp3","Sp4","Sp1","Sp2","Sp2")
Var1 <- c(12,32,12,47,25,59,15,45,27,42,25,36,36)
Var2 <- c("aa","bb","aa","cc","dd","ee","aa","cc","aa","cc","aa","bb","bb")
NT <- c(20,45,3,89,29,89,30,55,85,8,26,56,9)
PT <- c(32,26,5,35,66,35,45,23,12,3,69,36,12)
df <- data.frame(St.Sam,Species,Var1,Var2,NT,PT)





--
View this message in context: http://r.789695.n4.nabble.com/aggregate-and-sum-on-some-columns-fromduplicate-rows-tp4686040.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list