[R] Extracting data from dataframe with tied rows

Peter Ehlers ehlers at ucalgary.ca
Fri Aug 24 19:51:09 CEST 2012


Here's another pretty straightforward solution, using the plyr pkg:

  DF <- data.frame(id, month, distance, bearing)
    # variables as defined in the OP

  require(plyr)
  DF1<-ddply(DF, .(id,month), summarize,
        maxdist = max(distance),
        maxbearing = bearing[which.max(distance)])

Peter Ehlers

On 2012-08-24 09:54, William Dunlap wrote:
> Or use ave() to compute the within-group ranks (reversed, so max has rank 1) and select
> the elements whose ranks are 1:
> f2 <- function (DATA)
> {
>      stopifnot(is.data.frame(DATA), all(c("distance", "id", "month") %in%
>          names(DATA)))
>      revRanks <- ave(DATA[["distance"]], DATA[["id"]], DATA[["month"]],
>          FUN = function(x) rank(-x, ties = "first"))
>      DATA[revRanks == 1, ]
> }
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of Peter Alspach
>> Sent: Thursday, August 23, 2012 4:37 PM
>> To: rjb; r-help at r-project.org
>> Subject: Re: [R] Extracting data from dataframe with tied rows
>>
>> Tena koe John
>>
>> One way:
>>
>> johnData <- data.frame(id=rep(LETTERS[1:5],20), distance=rnorm(1:100, mean = 100),
>> bearing=sample(1:360,100,replace=T), month=sample(1:12,100,replace=T))
>> johnAgg <- aggregate(johnData[,'distance'], johnData[,c('id','month')], max)
>> names(johnAgg)[3] <- 'distance'
>> merge(johnAgg, johnData)
>>
>> HTH ....
>>
>> Peter Alspach
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of rjb
>> Sent: Friday, 24 August 2012 9:19 a.m.
>> To: r-help at r-project.org
>> Subject: [R] Extracting data from dataframe with tied rows
>>
>> Hi R help,
>>
>> I'm a fairly experienced R user but this manipulation has me stumped, please
>> help:
>>
>> DATA
>> id<-rep(LETTERS[1:5],20)
>> distance<-rnorm(1:100, mean = 100)
>> bearing<-sample(1:360,100,replace=T)
>> month<-sample(1:12,100,replace=T)
>>
>> I have a dataset with records of individuals (id) , each with a distance
>> (distance) & direction (bearing) recorded for each month (month).
>> I want to find the largest distance per individual per month, which is easy
>> with /tapply/ or /melt/cast (reshape)/,
>> head(DATA_m<-melt(DATA,id=c("id","month")))
>> cast(DATA_m,id+month~.,max)
>> OR
>> na.omit(melt(tapply(distance,list(id,month),max)))
>>
>> *BUT THE CATCH IS* ,
>> I also want the the *corresponding*  bearing for that maximum distance per
>> month. I've tried the steps above plus using which.max() and loops, but
>> can't solve the problem. The real dataset is about 6000 rows.
>>
>> I'm guessing the answer is in finding the row number from the original DATA
>> but I can't figure how to do that with tapply or melt.
>>
>> Any suggestions would be greatly appreciated.
>>
>> John Burnside




More information about the R-help mailing list