[R] Help on averaging sets of rows defined by row name

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Apr 20 15:53:42 CEST 2007


Dear Marije,

I think that aggregate() would make your life a lot easier.

aggregate(table.imputed, by = table.imputed[, 1], FUN = "mean")

Cheers,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be
www.inbo.be 

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  ~William W. Watt
A statistical analysis, properly conducted, is a delicate dissection of
uncertainties, a surgery of suppositions. ~M.J.Moroney

 

> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op stat.math.ethz.ch 
> [mailto:r-help-bounces op stat.math.ethz.ch] Namens Booman, M
> Verzonden: vrijdag 20 april 2007 15:27
> Aan: r-help op stat.math.ethz.ch
> Onderwerp: [R] Help on averaging sets of rows defined by row name
> 
> Dear all,
> 
> This is my problem: I have a table of gene expression data, 
> where 1st column is gene name, and 2nd -39th columns each are 
> exression data for 38 samples. There are multiple 
> measurements per sample for each gene, so there are multiple 
> rows for each gene name. I want to average these measurements 
> so i end up with one value per sample for each gene name. The 
> output data frame (table.averaged) is further used in other R 
> script. The code I use now (see below) takes 20 secs for each 
> loop, so it takes 45 minutes to average my files of 13500 
> unique genes. Can anyone help me do this faster?
> 
> Cheers, marije
> 
> Code I use: 
> 
> 
> table.imputed[,1] <- as.character(table.imputed[,1])    
> #table.imputed is data.frame,1st column = gene name (class 
> factor), rest of columns = expression data (class numeric)
> 
> genesunique <- unique(table.imputed[,1])                   
> #To make list of unique genes in the set
> 
> table.averaged <- NULL
>   for (j in 1:length(genesunique)) {
>      if (j%%100 == 0){                                        
>            #To report progress
>        cat(j, "genes finished", sep=" ", fill=TRUE)
>        }
>      
> table.averaged<-rbind(table.averaged,givemean(genesunique[j], 
> table.imputed))   #collects all rows of average values and 
> binds them back into one data frame
>   }
> 
> givemean <- function (gene, table.imputed) {
>    thisgene<-table.imputed[table.imputed[,1]==gene,]          
>                              #make a subset containing only 
> the rows for one gene name
>    data.frame(gene,t(sapply(thisgene[,2:ncol(thisgene)],mean, 
> na.rm=TRUE)))     #calculates average for each sample 
> (column) and outputs one row of average values and the gene name
> }
> 
> 
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd 
> voor de geadresseerde(n). Anderen dan de geadresseerde mogen 
> geen gebruik maken van dit bericht, het openbaar maken of op 
> enige wijze verspreiden of vermenigvuldigen. Het UMCG kan 
> niet aansprakelijk gesteld worden voor een incomplete 
> aankomst of vertraging van dit verzonden bericht.
> 
> The contents of this message are confidential and only 
> intended for the eyes of the addressee(s). Others than the 
> addressee(s) are not allowed to use this message, to make it 
> public or to distribute or multiply this message in any way. 
> The UMCG cannot be held responsible for incomplete reception 
> or delay of this transferred message.
>



More information about the R-help mailing list