[R] select duplicate identifier with higher mean across sample columns

Rui Barradas ruipbarradas at sapo.pt
Sun Nov 4 20:40:21 CET 2012


Hello,

Thanks for the data example. (You forgot samp2a).
Try the following.


mdf <- read.table(text="
id samp1 samp2 samp2a
1  A   100   110    110
2  A   120   130    150
3  C   101   131    151
4  D   110   150    130
5  E   132   122    122
6  F   123   143    143
", header=TRUE)

idx <- ave(rowMeans(mdf[,-1]), mdf$id, FUN = function(x) x == max(x))
mdf[as.logical(idx), ]


Hope this helps,

Rui Barradas
Em 04-11-2012 19:25, Adrian Johnson escreveu:
> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
>   id <- c("A", "A", "C", "D", "E", "F")
>   year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>   samp1 <- c(100, 120, 101, 110, 132,123)
>   samp2 <- c(110, 130, 131, 150, 122,143)
>   mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
>    id samp1 samp2 samp2a
> 1  A   100   110    110
> 2  A   120   130    150
> 3  C   101   131    151
> 4  D   110   150    130
> 5  E   132   122    122
> 6  F   123   143    143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list