[BioC] Question about amount of diff expressed genes in fit using a design matrix with intercept

James W. MacDonald jmacdon at uw.edu
Sat Nov 24 16:00:13 CET 2012


Hi Belisa,

On 11/24/2012 9:23 AM, Belisa Santos [guest] wrote:
> Hello everybody,
>
>      I have used a design matrix with a control as intercept. When I fit the matrix with my RMA normalized expression set, do eBayes, and then ask for a topTable:
> topTable(fit, adjust="BH", p.value=1e-5, lfc=1, number=Inf)
> I get the same number of genes as I have input (54675)... and the lowest adjusted p-value is e-29... Now it cannot be the case where ALL my genes are differentially expressed between control and treatments....

But that isn't what you are testing with your call to topTable(). You 
are asking if *any* coefficient is not equal to zero, and your intercept 
will almost always (or in your case, always) be different from zero.

>
> I am missing something? The intercept functions as a "contrast", right? Is this common?  Could I be doing something wrong? Please help me...

No, the intercept functions as a baseline, to which all other samples 
are compared. The intercept is estimating the mean expression of the 
control samples, and all the other coefficients are estimating the 
difference between a given sample and the control (intercept).

I am not sure what you expected to get from calling topTable() without 
specifying a coefficient, but what it does is compute an F-test for all 
coefficients. Which is testing the null hypothesis that all coefficients 
= 0. This will almost never be true of an intercept, as it is measuring 
the mean expression of the controls, which will usually be larger than 
zero (hence all significant in your topTable).

You might want to know which genes are significant in one or more of 
your coefficients, in which case you would do

topTable(fit, 2:17)

Otherwise, you usually want to specify just one or a few coefficients to 
test at one time, because knowing that a gene is significant in one or 
more of 16 different comparisons is not really that helpful.

Best,

Jim


>
> Thank you in advance for you kind help. All the best,
>
> Belisa
>
>
>   -- output of sessionInfo():
>
> # Code I am using,
> cel<-ReadAffy
>
> expression_set<- rma(cel)
>
> fit<- lmFit(expression_set, design.ctrl_intercept)
> fit<- eBayes(fit)
> topTable(fit, adjust.method="BH", p.value=1e-5, lfc=1, number=Inf)
>
> # My design matrix
>
>     ctrl0 B1.ctrl BT1.ctrl BI1.ctrl BTI1.ctrl B2.ctrl BT2.ctrl BI2.ctrl BTI2.ctrl B3.ctrl BT3.ctrl BI3.ctrl BTI3.ctrl B7.ctrl BT7.ctrl BI7.ctrl BTI7.ctrl
> 1      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 2      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 3      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 4      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 5      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 6      1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 7      1  1   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 8      1  1   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 9      1  1   0   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 10     1  0   1   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 11     1  0   1   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 12     1  0   1   0    0  0   0   0    0  0   0   0    0  0   0   0    0
> 13     1  0   0   1    0  0   0   0    0  0   0   0    0  0   0   0    0
> 14     1  0   0   1    0  0   0   0    0  0   0   0    0  0   0   0    0
> 15     1  0   0   1    0  0   0   0    0  0   0   0    0  0   0   0    0
> (...)
> 52     1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    1
> 53     1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    1
> 54     1  0   0   0    0  0   0   0    0  0   0   0    0  0   0   0    1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list