[R] subsetting a data.frame

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Wed Oct 10 16:56:01 CEST 2007


jim holtman wrote:
> Is this what you want?
>
>   
>> x <- read.table(textConnection("Score     Name
>>     
> + 88           000019_0070
> + 88           000019_0070
> + 87           000019_0070
> + 79           002127_0658
> + 79           002127_0658
> + 77           002127_0658"), header=TRUE)
>   
>> # return best scores
>> best <- by(x, x$Name, function(.nam){
>>     
> +     .nam[which(.nam$Score == max(.nam$Score)),]
> + })
>   
>> do.call('rbind', best)
>>     
>               Score        Name
> 000019_0070.1    88 000019_0070
> 000019_0070.2    88 000019_0070
> 002127_0658.4    79 002127_0658
> 002127_0658.5    79 002127_0658
>   
Or, (same idea. really)

> do.call(rbind,lapply(split(d, d$Name), subset, Score==max(Score)))
              Score        Name
000019_0070.1    88 000019_0070
000019_0070.2    88 000019_0070
002127_0658.4    79 002127_0658
002127_0658.5    79 002127_0658

Another idea, with the advantage of leaving data in the original order:

> ix <- d$Score == ave(d$Score, d$Name, FUN=max)
> d[ix,]
  Score        Name
1    88 000019_0070
2    88 000019_0070
4    79 002127_0658
5    79 002127_0658


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list