[BioC] How is LIMMA actually calculating the average expression value

Tue Jun 11 09:19:58 CEST 2013

Dear Dr. Smyth.
Thanks for your reply and let me apologize for probably miscalling the terms and/or not being clear enough when posting my question, microarrays data analysis is just a part of my study but definitely is giving me hard time to interpret my factorial with orthogonal contrast analysis.
Before posting my previous question, I read the LIMMA user guide section 12.1 which states the the concept you are mentioning for AveEXpr in top table and you are right, the concept is clear with respect to the top table. 
However the average expression I mentioned is the one I got when using all the next code:
rawData <- ReadAffy()
esetgcrma <- gcrma(rawData)
eset1 <- exprs(esetgcrma)
gene.rm<-(strsplit2(rownames(eset1),"-")[,1]=="AFFX")
eset2 <- eset1[!gene.rm,]
GCRMA values per each array for probe:   Bt.17364.1.A1_at
4367_SFA_HLA.CEL	4368_CTL_HLA.CEL	4381_SFA_LLA.CEL	4384_EFA_HLA.CEL	4387_EFA_HLA.CEL	4388_EFA_LLA.CEL
11.340	8.709	10.048	10.167	9.589	12.664

4394_CTL_HLA.CEL	4395_CTL_LLA.CEL	4396_SFA_HLA.CEL	4398_EFA_LLA.CEL	4399_SFA_LLA.CEL	4400_CTL_HLA.CEL
10.829	11.232	11.044	10.238	9.575	8.164

4402_EFA_HLA.CEL	4404_CTL_LLA.CEL	4409_EFA_LLA.CEL	4410_SFA_HLA.CEL	4413_CTL_LLA.CEL	4429_SFA_LLA.CEL
10.828	9.296	11.651	11.522	10.007	9.460
###############################################################
phenoDiet <- read.table("designDD_MR.txt",header=T,sep="\t")
write.exprs(esetgcrma, file="affy_ALL.gcrma.xls")
TS <- paste(phenoDiet$DD, phenoDiet$MR, sep=".")
TS
TS <- factor(TS, levels=c("Ctl.LLA", "Ctl.HLA","SFA.LLA","SFA.HLA","EFA.LLA", "EFA.HLA"))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(eset2, design, method="robust", maxit=1000000)
efit <- eBayes(fit)       
#Contrast results
MatContrast=makeContrasts(FAT=(SFA.LLA + SFA.HLA + EFA.LLA + EFA.HLA)/4 - (Ctl.LLA + Ctl.HLA)/2,
                          FA=(EFA.LLA + EFA.HLA)/2 - (SFA.LLA + SFA.HLA)/2,
                          MR=(EFA.HLA+SFA.HLA+Ctl.HLA)/3 - (EFA.LLA+SFA.LLA+Ctl.LLA)/3,
                          FATbyMR=((EFA.HLA+SFA.HLA)/2 - Ctl.HLA) - ((EFA.LLA+SFA.LLA)/2-Ctl.LLA), 
                          FAbyMR=(EFA.HLA-SFA.HLA)-(EFA.LLA - SFA.LLA),
                          levels=design)
fitMat<-contrasts.fit(fit,MatContrast)
fit2=eBayes(fitMat)
top=24016
#commands for tables to save the table of results: This commands resulted in the top tables containing:
**** I am listing an example for a specific probe*****

TopContrastALL=topTable(fit2,coef=NULL,number=top)  
write.table(TopContrastALL,"All_logFC_contrasts.xls",sep="\t")
ID	FAT	FA	MR	FATbyMR	FAbyMR	AveExpr	F	P.Value	adj.P.Val
Bt.17364.1.A1_at	1.3331	0.4183	-0.3757	1.3361	-3.0755	10.3534	14.5472	0.0001	0.0009

TopContrast1=topTable(fit2,coef=1,number=top)
write.table(TopContrast1,"FAT_contrast.xls",sep="\t")
	ID	logFC	AveExpr	t	P.Value	adj.P.Val	B
5483	Bt.17364.1.A1_at	1.333	10.353	5.369	0.0002	0.0026	-2.337

TopContrast2=topTable(fit2,coef=2,number=top)
write.table(TopContrast2,"FA_contrast.xls",sep="\t")
	ID	logFC	AveExpr	t	P.Value	adj.P.Val	B
5483	Bt.17364.1.A1_at	0.418	10.353	1.530	0.15	0.59	-9.04

TopContrast3=topTable(fit2,coef=3,number=top)
write.table(TopContrast3,"MR_contrast.xls",sep="\t")

TopContastr4=topTable(fit2,coef=4,number=top)
write.table(TopContastr4,"FATbyMR_contrast.xls",sep="\t")

TopContrast5=topTable(fit2,coef=5,number=top)
write.table(TopContrast5,"FAbyMR_contrast.xls",sep="\t")

#Tables for the values of F statistics summ.allcontrast=topTable(efit,coef=NULL,adjust.method="BH",number=top)     # table of differentially expressed probesets
write.table(summ.allcontrast,"Fstat_AveExpr_allTRT.xls",sep="\t")
ID	Ctl.LLA	Ctl.HLA	SFA.LLA	SFA.HLA	EFA.LLA	EFA.HLA	AveExpr	F	P.Value	adj.P.Val
Bt.17364.1.A1_at	10.007	8.741	9.694	11.302	11.651	10.182	10.353	1384.0	1.01E-16	1.30E-16
summ.FAT=topTable(efit, coef=1,adjust.method="BH",number=top)
write.table(summ.FAT,"Fstat_FATseparate.xls",sep="\t")
	ID	logFC	AveExpr	t	P.Value	adj.P.Val	B
5483	Bt.17364.1.A1_at	10.007	10.353	34.372	1.24E-13	1.62E-13	18.270

summ.FA=topTable(efit, coef=2,adjust.method="BH",number=top)
write.table(summ.FA,"Fstat_FAseparate.xls",sep="\t")
	ID	logFC	AveExpr	t	P.Value	adj.P.Val	B
5483	Bt.17364.1.A1_at	8.741	10.353	29.622	7.6E-13	9.4E-13	1.6E+01

summ.MR=topTable(efit, coef=3,adjust.method="BH",number=top)
write.table(summ.MR,"Fstat_MRseparate.xls",sep="\t")

summ.FATbyMR=topTable(efit, coef=4,adjust.method="BH",number=top)
write.table(summ.FAT,"Fstat_FATbyMRseparate.xls",sep="\t")

summ.FAbyMR=topTable(efit, coef=5,adjust.method="BH",number=top)
write.table(summ.FAbyMR,"Fstat_FAbyMRseparate.xls",sep="\t")

ALSO, Thanks to your recommendation,
 I review my code and rerun the data using 2 options of topeTable one with the fit2 command and other with the efit as posted with the example I am not sure how the average expression value as presented with the command summ.allcontrast=topTable(efit,coef=NULL,adjust.method="BH",number=top). 
Calculating the AveExpr for each group of 3 arrays with the GCRMA log2 values, it give me quite similar results as that on the summary table for some of the six treatments but for others average expression valus simply do not match.
In addition when running specific contrast as it is the FAT contrast when using efit or fit2, both approaches give me the same AveExpr value but log FC, T value, B, and P values. The use of fit2 give the FC values as if I were manually calculating with the AvExpr obtained in the later table above. So I think the right one table to use if that generate with the fit2, but so I am wonder what is the Ftable with efit calculating as log FC???

Thanks so much for all, 

Regards,
Miriam

********************************
Miriam Garcia, MS, PhD
Department of Animal Sciences
University of Florida

________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Monday, June 10, 2013 8:36 PM
To: Garcia Orellana,Miriam
Cc: Bioconductor mailing list
Subject: How is LIMMA actually calculating the average expression value

Dear Miriam,

> Date: Sun, 9 Jun 2013 10:28:41 +0000
> From: "Garcia Orellana,Miriam" <mgarciao at ufl.edu>
> To: Bioconductor mailing list ?[bioconductor at r-project.org]?
>       <bioconductor at r-project.org>
> Subject: [BioC] How is LIMMA actually calculating the average
>       expression value?
>
> Dear all:

> I have found quite the same question being asked at least twice but I
> still not have clear answer about how the ebayes method in LIMMA
> calculate the average expression value for a given experimental group.

The limma package does not and never has computed the expression level for
individual experimental groups.  The AveExpr columin is the average of all
arrays (ie for all groups not for one group).  That is clearly documented,
or so it seems to me.

The limma User's Guide gives in Section 13.1 a description of all
quantities output by topTable.  It says

   "The AveExpr column gives the average log2-expression level for
   that gene across all the arrays and channels in the experiment."

That seems to be me to be completely unambiguous.  What is unclear about
it?

The help page ?topTable says that AveExpr is

   "average log2-expression for the probe over all arrays and channels,
   same as Amean in the MarrayLM object"

The help page ?"MArrayLM-class" says that Amean is a

   "numeric vector containing the average log-intensity for each probe over
   all the arrays in the original linear model fit. Note this vector does
   not change when a contrast is applied to the fit using contrasts.fit."

Again this seem to me to be unambiguous.

I've said the same thing in response to questions on this list several
times.  What is unclear?

> I have used GCRMA to normalize my affimetrix values and then obtained
> the log2 expression values as (values below do not necessarily
> correspond to the same probe):
>
> 4395_CTL_LLA.CEL        :7.89
> 4404_CTL_LLA.CEL:        8.21
> 4413_CTL_LLA.CEL:       8.07
> I have calculated by excel:
> Simple mean = 8.055
> Geometric mean = 8.054
> Whereas the top table for average expression of these 3 values gave me: 8.055

There is no such thing as a "top table for average expression".  Top
tables are always for comparisons between groups.  I have no idea what you
are trying to do.

Could you please read the documentation, and have a look at the posting
guide?  If you post again, please give the whole code leading to this
output, and give expression for all arrays in your experiment, not just
three.

Best wishes
Gordon

> This values are quite the same regardless of calculation method, However
> when more variability is among values the calculated average expression
> differs differs quite largely:

> 4368_CTL_HLA.CEL:       8.26
> 4394_CTL_HLA.CEL:       7.17
> 4400_CTL_HLA.CEL:       8.70
> Simple mean = 8.042
> Geometric mean = 8.015
> Whereas the top table for average expression of these 3 values gave me: 8.263. In this case this average expression value seems to be the median but on the next set of samples not.

> 4368_CTL_HLA.CEL:       10.758
> 4394_CTL_HLA.CEL:       10.907
> 4400_CTL_HLA.CEL:       7.634
> Simple mean = 8.92
> Geometric mean = 9.766
> Whereas the top table for average expression of these 3 values gave me: 9.862, which in this case is not at all close to the median.
>
> I will appreciate any help on this matter. It will also be appreciated,
> any additional though on whether the adjusted average expression
> (whichever the method is) is well enough to correct for expression
> variability within a given treatment, so I do not need to be worry for
> any potential outlier expression value or should I be concerned about?
>
> Regards,
> Miriam
>
> ********************************
> Miriam Garcia, MS, PhD
> Department of Animal Sciences
> University of Florida
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}