[R] aggregate and list elements of variables in data.frame

Ben Tupper btupper @end|ng |rom b|ge|ow@org
Thu Jun 7 14:47:55 CEST 2018


Hi,

Does this do what you want?  I had to change the id values to something more obvious.  It uses tibbles which allow each variable to be a list.

library(tibble)
library(dplyr)
x       <- tibble(id=LETTERS[1:10],
                A=c(123,345,123,678,345,123,789,345,123,789))
uA      <- unique(x$A)
idx     <- lapply(uA, function(v) which(x$A %in% v))
vals    <- lapply(idx, function(index) x$id[index])

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals)


> r
# A tibble: 4 x 3
  unique_A list_idx  list_vals
     <dbl> <list>    <list>   
1     123. <int [4]> <chr [4]>
2     345. <int [3]> <chr [3]>
3     678. <int [1]> <chr [1]>
4     789. <int [2]> <chr [2]>
> r$list_idx[1]
[[1]]
[1] 1 3 6 9

> r$list_vals[1]
[[1]]
[1] "A" "C" "F" "I"


Cheers,
ben



> On Jun 7, 2018, at 8:21 AM, Massimo Bressan <massimo.bressan using arpa.veneto.it> wrote:
> 
> sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A 
> 
> #please consider this new example 
> 
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789)) 
> t 
> 
> # I need to get this result 
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97')) 
> r 
> 
> # any help for this, please? 
> 
> 
> 
> 
> 
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it> 
> A: "r-help" <R-help using r-project.org> 
> Inviato: Giovedì, 7 giugno 2018 10:09:55 
> Oggetto: Re: aggregate and list elements of variables in data.frame 
> 
> thanks for the help 
> 
> I'm posting here the complete solution 
> 
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
> t$A <- factor(t$A) 
> l<-sapply(levels(t$A), function(x) which(t$A==x)) 
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", "))) 
> r<-cbind(unique_A=row.names(r),r) 
> row.names(r)<-NULL 
> r 
> 
> best 
> 
> 
> 
> Da: "Massimo Bressan" <massimo.bressan using arpa.veneto.it> 
> A: "r-help" <R-help using r-project.org> 
> Inviato: Mercoledì, 6 giugno 2018 10:13:10 
> Oggetto: aggregate and list elements of variables in data.frame 
> 
> #given the following reproducible and simplified example 
> 
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789)) 
> t 
> 
> #I need to get the following result 
> 
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10')) 
> r 
> 
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A" 
> #any help for that? 
> 
> #so far I've just managed to "aggregate" and "count", like: 
> 
> library(sqldf) 
> sqldf('select count(*) as count_id, A as unique_A from t group by A') 
> 
> library(dplyr) 
> t%>%group_by(unique_A=A) %>% summarise(count_id = n()) 
> 
> # thank you 
> 
> 
> -- 
> 
> ------------------------------------------------------------ 
> Massimo Bressan 
> 
> ARPAV 
> Agenzia Regionale per la Prevenzione e 
> Protezione Ambientale del Veneto 
> 
> Dipartimento Provinciale di Treviso 
> Via Santa Barbara, 5/a 
> 31100 Treviso, Italy 
> 
> tel: +39 0422 558545 
> fax: +39 0422 558516 
> e-mail: massimo.bressan using arpa.veneto.it 
> ------------------------------------------------------------ 
> 
> 
> -- 
> 
> ------------------------------------------------------------ 
> Massimo Bressan 
> 
> ARPAV 
> Agenzia Regionale per la Prevenzione e 
> Protezione Ambientale del Veneto 
> 
> Dipartimento Provinciale di Treviso 
> Via Santa Barbara, 5/a 
> 31100 Treviso, Italy 
> 
> tel: +39 0422 558545 
> fax: +39 0422 558516 
> e-mail: massimo.bressan using arpa.veneto.it 
> ------------------------------------------------------------ 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/






	[[alternative HTML version deleted]]




More information about the R-help mailing list