[R] highest and second highest value in row for each combination

jim holtman jholtman at gmail.com
Fri Feb 11 01:38:43 CET 2011


here is another way of doing it:

> set.seed(19)
>
> area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
> type<-c(rep(1:10,5))
> a<-rnorm(50)
> b<-rnorm(50)
> c<-rnorm(50)
> d<-rnorm(50)
> df<-cbind(area,type,a,b,c,d)
> df1 <- data.frame(df)
> require(reshape2)
> df.melt <- melt(df1, id=c('area', 'type'))
> result <- do.call(rbind,
+     lapply(split(df.melt, list(df.melt$area, df.melt$type),
drop=TRUE), function(x){
+         head(x[order(x$value, decreasing=TRUE),], 2) # get at most
the first two if present
+     })
+ )
>
> result
         area type variable       value
1.1.51      1    1        b  1.70366970
1.1.101     1    1        c  0.79101298
2.1.161     2    1        d  1.56797593
2.1.61      2    1        b  0.79868725
3.1.21      3    1        a  1.42342348
3.1.121     3    1        c  0.44547975
4.1.131     4    1        c  1.72745545
4.1.31      4    1        a  1.50474144
5.1.141     5    1        c  1.72521942
5.1.191     5    1        d  0.52466470


On Thu, Feb 10, 2011 at 12:55 PM, Phil Spector
<spector at stat.berkeley.edu> wrote:
> Alain -
>   Here's a reproducible data set:
>
> set.seed(19)
> area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
> type<-c(rep(1:10,5))
> a<-rnorm(50)
> b<-rnorm(50)
> c<-rnorm(50)
> d<-rnorm(50)
> df<-cbind(area,type,a,b,c,d)
>
>   First I'll make a helper function to operate on one row of the data frame:
>
> get2 = function(x){
>   y = x[-c(1,2)]
>   oy = order(y,decreasing=TRUE)
>   nms = colnames(df)[-c(1,2)]
>   data.frame(area=rep(x[1],2),type=rep(x[2],2),
>              max=y[oy[1:2]],colname=nms[oy[1:2]])
> }
>
> Now I can use apply, do.call and rbind to get the answer:
>
>> answer = do.call(rbind,apply(df,1,get2))
>> head(answer)
>
>   area type        max colname
> b     1    1  1.7036697       b
> c     1    1  0.7910130       c
> c1    1    2  2.4576579       c
> a     1    2  0.3885812       a
> c2    1    3  1.2363598       c
> a1    1    3 -0.3443333       a
>
> (My numbers differ from yours because you didn't specify
> a seed for the random number generator)
>
> I'm not exactly sure how to form your column "combination", though.
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spector at stat.berkeley.edu
>
>
> On Thu, 10 Feb 2011, Alain D. wrote:
>
>> Dear R-List,
>>
>> I have a dataframe
>>
>> area<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
>> type<-c(rep(1:10,5))
>> a<-rnorm(50)
>> b<-rnorm(50)
>> c<-rnorm(50)
>> d<-rnorm(50)
>> df<-cbind(area,type,a,b,c,d)
>>
>>
>> df
>>     area type           a              b
>> c                      d
>> [1,]    1    1     0.45608192  0.240378547  2.05208079 -1.18827462
>> [2,]    1    2    -0.12119506 -0.028078577 -2.64323695 -0.83923441
>> [3,]    1    3     0.09066133 -1.134069619  1.53344812 -0.15670239
>> [4,]    1    4    -1.34505241  1.919941172 -1.02090099  0.75664358
>> [5,]    1    5    -0.29279617 -0.314955019 -0.88809266  2.22282022
>> [6,]    1    6    -0.59697893 -0.652937746  1.05132400 -0.02469151
>> [7,]    1    7    -1.18199400  0.728165962 -1.51419348  0.65640976
>> [8,]    1    8    -0.72925659  0.303514237  0.79758488  0.93444350
>> [9,]    1    9    -1.60080508 -0.187562633  0.51288428 -0.55692877
>> [10,]    1   10    0.54373268 -0.494994392  0.52902381  1.12938122
>> [11,]    2    1    -1.29675664 -0.644990784 -2.44067511 -0.18489544
>> [12,]    2    2     0.86330699  1.458038882  1.17514710  1.32896878
>> [13,]    2    3     0.30069402  1.361211939  0.84757211  1.14502761
>> ...
>>
>> Now I want to have for each combination of area and type the name and
>> corresponding value of the two columns with the highest and second highest
>> value a,b,c,d.
>> In the above example it should be something like
>>
>> combination         max     colname
>> 11                      2.05          c
>> 11                      0.46          a
>> 12                     -0.03          b
>> 12                     -0.12          a
>> ...
>>
>> (It might be arranged differently, though)
>>
>> Can anyone help?
>>
>> Thank you in advance!
>>
>> Alain
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list