[R] Trouble retrieving the second largest value from each row of a data.frame

Joshua Wiley jwiley.psych at gmail.com
Sun Jul 25 02:57:10 CEST 2010


On Sat, Jul 24, 2010 at 5:09 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jul 24, 2010, at 4:54 PM, <mpward at illinois.edu> wrote:
>
>> THANKS, but I have one issue and one question.
>>
>> For some reason the "secondstrongest" value for row 3 and 6 are incorrect
>> (they are the strongest) the remaining 10 are correct??
>
> In my run of Wiley's code I instead get identical values for rows 2,5,6.

Yes, my apologies; I neglected a [-strongest] when extracting the
second highest value.  I included a corrected form below; however,
Winsemius' code is cleaner, not to mention easier to generalize, so I
see no reason not to use that option.  You might consider using a
different object name than 'diff' since it is also the name of a
function.

Josh

#######
my.finder <- function(mydata) {
  my.fun <- function(data) {
    strongest <- which.max(data)
    secondstrongest <- which.max(data[-strongest])
    strongestantenna <- names(data)[strongest]
    secondstrongantenna <- names(data[-strongest])[secondstrongest]
    value <- matrix(c(data[strongest], data[-strongest][secondstrongest],
                      strongestantenna, secondstrongantenna), ncol =4)
    return(value)
  }
  dat <- apply(mydata, 1, my.fun)
  dat <- t(dat)
  dat <- as.data.frame(dat, stringsAsFactors = FALSE)
  colnames(dat) <- c("strongest", "secondstrongest",
                     "strongestantenna", "secondstrongantenna")
  dat[ , "strongest"] <- as.numeric(dat[ , "strongest"])
  dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"])
  return(dat)
}




> Holtman's and my solutions did not suffer from that defect, although mine
> suffered from my misreading of your request, thinking that you wanted the
> top 3. The fix is trivial
>>
>> These data are being used to track radio-tagged birds, they are from
>> automated radio telemetry receivers.  I will applying the following formula
>>
>>  diff <- ((strongest- secondstrongest)/100)
>>  bearingdiff <-30-(-0.0624*(diff**2))-(2.8346*diff)
>
> vals <- c("value0", "value60", "value120", "value180", "value240",
> "value300")
> value.str2 <- (match(yourdata$secondstrongestantenna, vals)-1)*60
> value.str1 <- (match(yourdata$strongestantenna, vals)-1)*60
> change.ind <- abs(match(yourdata, vals) - which(match(yourdata, vals) )
>
>>
>> A) Then the bearing diff is added to strongestantenna (value0 = 0degrees)
>> if the secondstrongestatenna is greater (eg value0 and value60),
>
>> B) or if the secondstrongestantenna is smaller than the strongestantenna,
>> then the bearingdiff is substracted from the strongestantenna.
>
>>
>> C) The only exception is that if value0 (0degrees) is strongest and
>> value300(360degrees) is the secondstrongestantenna then the bearing is
>> 360-bearingdiff.
>
>
>> D) Also the strongestantenna and secondstrongestantenna have to be next to
>> each other (e.g. value0 with value60, value240 with value300, value0 with
>> value300) or the results should be NA.
>
> After setting finalbearing with A, B, and C then:
> yourdata$finalbearing <- with(yourdata, ifelse (
>                                change.ind <5 & change.ind > 1 ,
>                                             NA, finalbearing) )
>
>> I have been trying to use a series of if,else statements to produce these
>> bearing, but all I am producing is errors. Any suggestion would be
>> appreciated.
>
>
>>
>> Again THANKS for you efforts.
>>
>> Mike
>>
>> ---- Original message ----
>>>
>>> Date: Fri, 23 Jul 2010 23:01:56 -0700
>>> From: Joshua Wiley <jwiley.psych at gmail.com>
>>> Subject: Re: [R] Trouble retrieving the second largest value from each
>>> row of  a data.frame
>>> To: mpward at illinois.edu
>>> Cc: r-help at r-project.org
>>>
>>> Hi,
>>>
>>> Here is a little function that will do what you want and return a nice
>>> output:
>>>
>>> #Function To calculate top two values and return
>>> my.finder <- function(mydata) {
>>> my.fun <- function(data) {
>>>  strongest <- which.max(data)
>>>  secondstrongest <- which.max(data[-strongest])
>>>  strongestantenna <- names(data)[strongest]
>>>  secondstrongantenna <- names(data[-strongest])[secondstrongest]
>>>  value <- matrix(c(data[strongest], data[secondstrongest],
>>>                    strongestantenna, secondstrongantenna), ncol =4)
>>>  return(value)
>>> }
>>> dat <- apply(mydata, 1, my.fun)
>>> dat <- t(dat)
>>> dat <- as.data.frame(dat, stringsAsFactors = FALSE)
>>> colnames(dat) <- c("strongest", "secondstrongest",
>>>                   "strongestantenna", "secondstrongantenna")
>>> dat[ , "strongest"] <- as.numeric(dat[ , "strongest"])
>>> dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"])
>>> return(dat)
>>> }
>>>
>>>
>>> #Using your example data:
>>>
>>> yourdata <- structure(list(value0 = c(-13007L, -12838L, -12880L, -12805L,
>>> -12834L, -11068L, -12807L, -12770L, -12988L, -11779L), value60 =
>>> c(-11707L,
>>> -13210L, -11778L, -11653L, -13527L, -11698L, -14068L, -11665L,
>>> -11736L, -12873L), value120 = c(-11072L, -11176L, -11113L, -11071L,
>>> -11067L, -12430L, -11092L, -11061L, -11137L, -12973L), value180 =
>>> c(-12471L,
>>> -11799L, -12439L, -12385L, -11638L, -12430L, -11709L, -12373L,
>>> -12570L, -12537L), value240 = c(-12838L, -13210L, -13089L, -11561L,
>>> -13527L, -12430L, -11607L, -11426L, -13467L, -12973L), value300 =
>>> c(-13357L,
>>> -13845L, -13880L, -13317L, -13873L, -12814L, -13025L, -12805L,
>>> -13739L, -11146L)), .Names = c("value0", "value60", "value120",
>>> "value180", "value240", "value300"), class = "data.frame", row.names =
>>> c("1",
>>> "2", "3", "4", "5", "6", "7", "8", "9", "10"))
>>>
>>> my.finder(yourdata) #and what you want is in a nicely labeled data frame
>>>
>>> #A potential problem is that it is not very efficient
>>>
>>> #Here is a test using a matrix of 100,000 rows
>>> #sampled from the same range as your data
>>> #with the same number of columns
>>>
>>> data.test <- matrix(
>>> sample(seq(min(yourdata),max(yourdata)), size = 500000, replace = TRUE),
>>> ncol = 5)
>>>
>>> system.time(my.finder(data.test))
>>>
>>> #On my system I get
>>>
>>>> system.time(my.finder(data.test))
>>>
>>>  user  system elapsed
>>>  2.89    0.00    2.89
>>>
>>> Hope that helps,
>>>
>>> Josh
>>>
>>>
>>>
>>> On Fri, Jul 23, 2010 at 6:20 PM,  <mpward at illinois.edu> wrote:
>>>>
>>>> I have a data frame with a couple million lines and want to retrieve the
>>>> largest and second largest values in each row, along with the label of the
>>>> column these values are in. For example
>>>>
>>>> row 1
>>>> strongest=-11072
>>>> secondstrongest=-11707
>>>> strongestantenna=value120
>>>> secondstrongantenna=value60
>>>>
>>>> Below is the code I am using and a truncated data.frame.  Retrieving the
>>>> largest value was easy, but I have been getting errors every way I have
>>>> tried to retrieve the second largest value.  I have not even tried to
>>>> retrieve the labels for the value yet.
>>>>
>>>> Any help would be appreciated
>>>> Mike
>>>>
>>>>
>>>>> data<-data.frame(value0,value60,value120,value180,value240,value300)
>>>>> data
>>>>
>>>>  value0 value60 value120 value180 value240 value300
>>>> 1  -13007  -11707   -11072   -12471   -12838   -13357
>>>> 2  -12838  -13210   -11176   -11799   -13210   -13845
>>>> 3  -12880  -11778   -11113   -12439   -13089   -13880
>>>> 4  -12805  -11653   -11071   -12385   -11561   -13317
>>>> 5  -12834  -13527   -11067   -11638   -13527   -13873
>>>> 6  -11068  -11698   -12430   -12430   -12430   -12814
>>>> 7  -12807  -14068   -11092   -11709   -11607   -13025
>>>> 8  -12770  -11665   -11061   -12373   -11426   -12805
>>>> 9  -12988  -11736   -11137   -12570   -13467   -13739
>>>> 10 -11779  -12873   -12973   -12537   -12973   -11146
>>>>>
>>>>> #largest value in the row
>>>>> strongest<-apply(data,1,max)
>>>>>
>>>>>
>>>>> #second largest value in the row
>>>>> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,])))
>>>>> secondstrongest<-apply(data,1,n)
>>>>
>>>> Error in data[1, ] : incorrect number of dimensions
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> University of California, Los Angeles
>>> http://www.joshuawiley.com/
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list