[R] How to get the most frequent value of the subgroup

Milan Bouchet-Valat nalimilan at club.fr
Fri Mar 30 17:54:49 CEST 2012


Le vendredi 30 mars 2012 à 11:39 -0400, David Winsemius a écrit :
> On Mar 30, 2012, at 3:38 AM, Milan Bouchet-Valat wrote:
> 
> > Le jeudi 29 mars 2012 à 09:49 -0500, Yongsuhk Jung a écrit :
> >> Dear Members of the R-Help,
> >>
> >>
> >>
> >> While using a R function - 'aggregate' that you developed, I become  
> >> to have
> >> a question.
> >>
> >> In that function,
> >>
> >>
> >>
> >>> aggregate(x, by, FUN, ..., simplify = TRUE)
> >>
> >>
> >>
> >> I was wondering about what type of FUN I should write if I want to  
> >> get "the
> >> most frequent value of the subgroup" as a summary statistics of the
> >> subgroups.
> >>
> >> I will appreciate if I can get your idea on this issue.
> > It would have been better if you had provided a sample data as asked  
> > by
> > the posting guide.
> 
> How TRUE.
> 
> >
> > Anyway, here's a possibility:
> >> df <- data.frame(a=rep(1:3, 2), b=c(1, 2, 2, 1, 1, 2))
> >> df
> >  a b
> > 1 1 1
> > 2 2 2
> > 3 3 2
> > 4 1 1
> > 5 2 1
> > 6 3 2
> >> aggregate(df$a, list(df$b), function(x) max(table(x)))
> >  Group.1 x
> > 1       1 2
> > 2       2 2
> 
> Prompted by the obvious error in that solution (since the mode of b==1  
> is 1 and the mode of b==2 is 3) I thought I would take my untested  
> code strategy and fix it as well, now that an example was "on the  
> table" for discussion:
> 
>  > aggregate(df1[1], by=df1[2], FUN=function(x){  tbl <- table(x);
>                          return( dimnames(tbl)[[1]][ which.max(tbl)] )
>                                                } )
>    b a
> 1 1 1
> 2 2 3
> 
> ( The modal values are in the "a" column.)
Hm, you're right, if you want the mode and not the frequency of the
mode, my "solution" is not enough. But it's quite straightforward to
extend it:
aggregate(df$a, list(df$b), function(x) names(which.max(table(x))))
  Group.1 x
1       1 1
2       2 3

The question of which solution is the most elegant is up to the OP (mine
is a little weird). ;-)


Cheers



More information about the R-help mailing list