[BioC] Calculating mean values corresponding to duplicate row names

Thu Sep 24 23:24:43 CEST 2009

On Thu, Sep 24, 2009 at 2:49 PM, Hari Easwaran <hariharan.pe at gmail.com> wrote:
> Hi all,
> I have a table (t) of the following format (first row is the header):
> A       x1        x2
> c       1          NA
> c       2          1002
> c       3           NA
> a       4          1004
> b       5           NA
> c        6          1006
> c        7          1007
> c        8          1008
> b        9          1009
> a       10         1010
> a       11         1011
> c       12         1012
> c       13         1013
> a       14         1014
> c        NA        1015
>
>
> I want to find the mean of all the values corresponding to the row names
> "a", "b", "c" (which are duplicated).
> I tried the following which works:
> U <- unique(t$A)
> tt <- t(sapply(U, FUN=function(u) {mean(na.omit(t[t$A==u, ]))}))

Take a look at the aggregate() function.

Sean

> However, in reality the table t is real huge ( almost 44K rows and 100
> columns). The above approach takes too long. Is there another alternative
> that anyone can think of.
>
> Thanks a lot for any help/suggestions.
>
> Sincerely,
> Hari
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>