[R] highest and second highest value in row for each combination

Phil Spector spector at stat.berkeley.edu
Thu Feb 10 18:55:03 CET 2011


Alain -
    Here's a reproducible data set:

set.seed(19)
area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
type<-c(rep(1:10,5))
a<-rnorm(50)
b<-rnorm(50)
c<-rnorm(50)
d<-rnorm(50)
df<-cbind(area,type,a,b,c,d)

    First I'll make a helper function to operate on one 
row of the data frame:

get2 = function(x){
    y = x[-c(1,2)]
    oy = order(y,decreasing=TRUE)
    nms = colnames(df)[-c(1,2)]
    data.frame(area=rep(x[1],2),type=rep(x[2],2),
               max=y[oy[1:2]],colname=nms[oy[1:2]])
}

Now I can use apply, do.call and rbind to get the answer:

> answer = do.call(rbind,apply(df,1,get2))
> head(answer)
    area type        max colname
b     1    1  1.7036697       b
c     1    1  0.7910130       c
c1    1    2  2.4576579       c
a     1    2  0.3885812       a
c2    1    3  1.2363598       c
a1    1    3 -0.3443333       a

(My numbers differ from yours because you didn't specify
a seed for the random number generator)

I'm not exactly sure how to form your column "combination", though.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Thu, 10 Feb 2011, Alain D. wrote:

> Dear R-List,
>
> I have a dataframe
>
> area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
> type<-c(rep(1:10,5))
> a<-rnorm(50)
> b<-rnorm(50)
> c<-rnorm(50)
> d<-rnorm(50)
> df<-cbind(area,type,a,b,c,d)
>
>
> df
>      area type           a              b
> c                      d
> [1,]    1    1     0.45608192  0.240378547  2.05208079 -1.18827462
> [2,]    1    2    -0.12119506 -0.028078577 -2.64323695 -0.83923441
> [3,]    1    3     0.09066133 -1.134069619  1.53344812 -0.15670239
> [4,]    1    4    -1.34505241  1.919941172 -1.02090099  0.75664358
> [5,]    1    5    -0.29279617 -0.314955019 -0.88809266  2.22282022
> [6,]    1    6    -0.59697893 -0.652937746  1.05132400 -0.02469151
> [7,]    1    7    -1.18199400  0.728165962 -1.51419348  0.65640976
> [8,]    1    8    -0.72925659  0.303514237  0.79758488  0.93444350
> [9,]    1    9    -1.60080508 -0.187562633  0.51288428 -0.55692877
> [10,]    1   10    0.54373268 -0.494994392  0.52902381  1.12938122
> [11,]    2    1    -1.29675664 -0.644990784 -2.44067511 -0.18489544
> [12,]    2    2     0.86330699  1.458038882  1.17514710  1.32896878
> [13,]    2    3     0.30069402  1.361211939  0.84757211  1.14502761
> ...
>
> Now I want to have for each combination of area and type the name and
> corresponding value of the two columns with the highest and second highest
> value a,b,c,d.
> In the above example it should be something like
>
> combination         max     colname
> 11                      2.05          c
> 11                      0.46          a
> 12                     -0.03          b
> 12                     -0.12          a
> ...
>
> (It might be arranged differently, though)
>
> Can anyone help?
>
> Thank you in advance!
>
> Alain
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list