[R] select duplicate identifier with higher mean across sample columns

jim holtman jholtman at gmail.com
Sun Nov 4 20:39:12 CET 2012


Is this what you want:

> mdf <- read.table(text = "  id samp1 samp2 samp2a
+ 1  A   100   110    110
+ 2  A   120   130    150
+ 3  C   101   131    151
+ 4  D   110   150    130
+ 5  E   132   122    122
+ 6  F   123   143    143", header = TRUE)
> result <- do.call(rbind, lapply(split(mdf, mdf$id), function(.id){
+     maxIndx <- which.max(rowMeans(.id[, -1L]))
+     .id[maxIndx, ]
+ }))
>
> result
  id samp1 samp2 samp2a
A  A   120   130    150
C  C   101   131    151
D  D   110   150    130
E  E   132   122    122
F  F   123   143    143


On Sun, Nov 4, 2012 at 2:25 PM, Adrian Johnson
<oriolebaltimore at gmail.com> wrote:
> Hi Group:
> I searched R groups before posting this question. I could not find the
> appropriate answer and I do not have clear understanding how to do
> this in R.
>
> I have a data frame with duplicated row identifiers but with different
> values across columns. I want to select the identifier with higher
> inter-quartile range or mean.
>
>
>  id <- c("A", "A", "C", "D", "E", "F")
>  year <- c(2000, 2001, 2001, 2002, 2003, 2004)
>  samp1 <- c(100, 120, 101, 110, 132,123)
>  samp2 <- c(110, 130, 131, 150, 122,143)
>  mdf <- data.frame(id,samp1,samp2,samp2a)
>
>
>> mdf
>   id samp1 samp2 samp2a
> 1  A   100   110    110
> 2  A   120   130    150
> 3  C   101   131    151
> 4  D   110   150    130
> 5  E   132   122    122
> 6  F   123   143    143
>
>
> There are two A ids in this df. I want to select the row with higher mean.
>
> How can I do this.
> Thanks
> Adrian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list