[R] aggregate and list elements of variables in data.frame

Thu Jun 7 14:28:07 CEST 2018

Using which() to subset t$id should do the trick:

sapply(levels(t$A), function(x) t$id[which(t$A==x)])

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 07/06/2018 14:21, Massimo Bressan wrote:
> sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A
>
> #please consider this new example
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> # I need to get this result
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
> r
>
> # any help for this, please?
>
>
>
>
>
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it>
> A: "r-help" <R-help using r-project.org>
> Inviato: Giovedì, 7 giugno 2018 10:09:55
> Oggetto: Re: aggregate and list elements of variables in data.frame
>
> thanks for the help
>
> I'm posting here the complete solution
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t$A <- factor(t$A)
> l<-sapply(levels(t$A), function(x) which(t$A==x))
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
> r<-cbind(unique_A=row.names(r),r)
> row.names(r)<-NULL
> r
>
> best
>
>
>
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it>
> A: "r-help" <R-help using r-project.org>
> Inviato: Mercoledì, 6 giugno 2018 10:13:10
> Oggetto: aggregate and list elements of variables in data.frame
>
> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>