[R] [BioC] colnames and get means for the columns with the "same" names

Weiwei Shi helprhelp at gmail.com
Mon Nov 6 23:40:16 CET 2006


hi,

I played around with these two functions but did not get what i want.
So I wrote a function by using a loop to do it and it is done in a
reasonable time:
> system.time(t3 <- iconix.convert(processed, 9, 7486, probes2llid.genego[,c(2,5)]))
[1] 12.356  4.494 16.836  0.000  0.000
> dim(t3)
[1]  129 4255

I am more interested in the approach instead of "averaging". I will
look into the archive since it is a very common problem Microarray
analysis has.

I post my function here in case someone needs it in the future.

iconix.convert <- function(orig, st=9, ed=7486, c.table){
    t1 <- orig[, st:ed]

    # treat missing
    t1 <- sapply(t1, function(x){ x[is.na(x)]<-0; x})

    x0 <- unique(c.table[,2])
    out <- matrix(0, dim(t1)[1], length(x0))
    j = 1
    for (i in x0){
        avg.col <- c.table[c.table[,2]==i, 1]
        if (length(avg.col) > 1){ # has 1:multiple ids
            t2 <- apply(t1[, avg.col], 1, mean)
        }
        else{
            t2 <- t1[, avg.col]
        }
        out[,j] <- t2
        j <- j + 1
    }
    out <- as.data.frame(out)
    colnames(out) <- x0
    out2 <- cbind(orig[, c(1:(st-1))], out, orig[,c((ed+1):dim(orig)[2])])
    colnames(out2)[dim(out2)[2]] <- "Group"
    out2
}



On 11/6/06, Davis, Sean (NIH/NCI) [E] <sdavis2 at mail.nih.gov> wrote:
> Hi, Weiwei.
>
> You probably want to look at a combination of merge() to combine your data with your conversion table followed by aggregate().  Read up on the help for those two functions and that should do it, if I understand what you want to do.  However, keep in mind that "averaging" the probesets representing the same gene may not represent the best solution.  Also, if you search the archive a bit, I know this question has come up before.
>
> Sean
>
>
>
> -----Original Message-----
> From: Weiwei Shi [mailto:helprhelp at gmail.com]
> Sent: Mon 11/6/2006 4:53 PM
> To: r-help
> Cc: bioconductor
> Subject: [BioC] colnames and get means for the columns with the "same" names
>
> hi,
> I have a conversion table for colnames like this:
>           Probe_ID HUMAN_LLID
> 1  AF106325_PROBE1       7052
> 2 NM_019386_PROBE1       7052
> 3 NM_012907_PROBE1        339
> 4  AW917796_PROBE1      84196
> 5    L27651_PROBE1      10864
>
> The Probe_ID contains a list of colnames for another data.frame, say x1.
> I need to convert such colnames to another ID's system, HUMAN_LLID by
> using the table. The colnames of x1 with the same names (in
> HUMAN_LLID) need to be averaged. Is there a good way to do it?
>
> I also put this question in bioconductor since I believe it might be
> solved by some package.
>
> thanks.
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list