[BioC] limma all adj.P.Val the same

Mon May 21 22:54:43 CEST 2012

Hi Sam,

Please don't take things off-list. We like to think of the list archives 
as a searchable resource, so any off-list conversations obviate that use.

On 5/21/2012 1:40 PM, Sam McInturf wrote:
> Jim,
> here is my target file and three topTables.
>> target
>                                      FileName
> Timing   Sample          Rep
> 1         1 Wheat Genome 4-26-12_(wheat).CEL      ANTH     SPIKE            A
> 2   10 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     PEDUNCLE   A
> 3   11 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     STEM           A
> 4   12 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     FLAGLEAF   A
> 5   13 Wheat Genome Array 5-3-12_(wheat).CEL   ANTH     NODES        A
> 6   14 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    SPIKE           A
> 7   15 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    PEDUNCLE   A
> 8   16 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    STEM            A
> 9   17 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    FLAGLEAF   A
> 10  18 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14   NODES        A
> 11  19 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH     SPIKE          B
> 12        2 Wheat Genome 4-26-12_(wheat).CEL     ANTH     PEDUNCLE   B
> 13  20 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH     STEM           B
> 14  21 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH    FLAGLEAF   B
> 15  22 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH    NODES        B
> 16  23 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   SPIKE           B
> 17  24 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   PEDUNCLE   B
> 18  25 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   STEM           B
> 19  26 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   FLAGLEAF   B
> 20  27 Wheat Genome Array 5-8-12_(wheat).CEL  DAA14   NODES        B
> 21  28 Wheat Genome Array 5-8-12_(wheat).CEL  ANTH    SPIKE           C
> 22 29 Wheat Genome Array 5-10-12_(wheat).CEL ANTH     PEDUNCLE   C
> 23        3 Wheat Genome 4-26-12_(wheat).CEL     ANTH     STEM           C
> 24  30 Wheat Genome Array 5-8-12_(wheat).CEL   ANTH FLAGLEAF      C
> 25        4 Wheat Genome 4-26-12_(wheat).CEL   ANTH    NODES            C
> 26        5 Wheat Genome 4-26-12_(wheat).CEL  DAA14    SPIKE            C
> 27        6 Wheat Genome 4-26-12_(wheat).CEL  DAA14 PEDUNCLE       C
> 28        7 Wheat Genome 4-26-12_(wheat).CEL  DAA14     STEM            C
> 29   8 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14 FLAGLEAF      C
> 30   9 Wheat Genome Array 5-3-12_(wheat).CEL  DAA14    NODES        C
>
>
>> topTable(fit, coef = "Spike")
>                             ID                 logFC         AveExpr
>        t      P.Value        adj.P.Val         B
> 54657    TaAffx.78175.1.S1_at  0.4215108 3.216996  4.514525
> 0.0001717272 0.9999869 -3.269251
> 40437    TaAffx.13113.3.S1_at  0.4911042 3.912458  4.156361
> 0.0004119435 0.9999869 -3.402249
> 55069    TaAffx.79041.1.S1_at  0.3726910 2.666774  4.063081
> 0.0005172708 0.9999869 -3.438121
> 39227    TaAffx.12387.2.S1_at -0.7312518 3.934821 -3.968346
> 0.0006516554 0.9999869 -3.475043
> 33239   TaAffx.107821.1.S1_at  0.3374931 3.718697  3.957133
> 0.0006696985 0.9999869 -3.479445
> 53443    TaAffx.66663.2.S1_at  0.5686773 4.155875  3.925068
> 0.0007240767 0.9999869 -3.492069
> 43703    TaAffx.28502.1.S1_at  0.3998569 4.607502  3.915422
> 0.0007412758 0.9999869 -3.495877
> 36163   TaAffx.112860.1.S1_at  0.3813519 3.401501  3.899640
> 0.0007702914 0.9999869 -3.502118
> 35389   TaAffx.111828.1.S1_at  0.5061669 4.090638  3.799882
> 0.0009815673 0.9999869 -3.541854
> 33890 TaAffx.108878.1.S1_x_at  0.3026730 3.772136  3.779824
> 0.0010305082 0.9999869 -3.549902
>

Ugh. Stupid line wraps.

So, three things. First, with only three replications you won't be able 
to reliably detect very small changes (and the fold changes you are 
seeing here are very small, barely even 1.5-fold on the natural scale).

Second, the unadjusted p-values are pretty big, so it isn't surprising 
that you have such large adjusted p-values.

Third, the average expression levels you are showing are really small. I 
have no experience with the Wheat array, but for the mammalian versions 
of the 3'-biased arrays I am used to seeing much larger expression 
values. This may be expected, but if I were you I would be doing some 
exploratory data analysis (image plots, density plots, etc, maybe use 
arrayQualityMetrics package).

You could also do a PCA plot to see if your samples are grouping as you 
expect. I often fit array weights in limma and append them to the PCA 
plot to see if weighting might help (see plotPCA, c.f. addtext argument 
in affycoretools).

Best,

Jim

>> topTable(fit, coef = "ABSURD")
>                           ID                logFC        AveExpr
>   t          P.Value        adj.P.Val         B
> 32597 TaAffx.106447.1.S1_at -0.7516767 3.714621 -4.451899 0.0002001067
> 0.9979636 -3.291944
> 1598       Ta.10910.1.A1_at -0.4429634 3.524978 -4.417483 0.0002176579
> 0.9979636 -3.304519
> 15208      Ta.25816.1.S1_at -0.4721049 6.752687 -4.246018 0.0003309222
> 0.9979636 -3.368238
> 57638   TaAffx.8440.1.S1_at  0.4725505 3.198740  4.106344 0.0004654457
> 0.9979636 -3.421423
> 8110      Ta.1735.2.S1_a_at  0.5125682 3.656730  4.019142 0.0005757797
> 0.9979636 -3.455186
> 50758  TaAffx.57221.1.S1_at -0.4975976 3.789495 -4.005497 0.0005952539
> 0.9979636 -3.460506
> 58501  TaAffx.85842.1.S1_at -0.4646994 3.862172 -4.004599 0.0005965581
> 0.9979636 -3.460857
> 30327       Ta.9283.2.S1_at -0.3770003 3.214838 -3.980915 0.0006320034
> 0.9979636 -3.470117
> 9090     Ta.18507.1.S1_s_at -0.4499434 2.592315 -3.962537 0.0006609422
> 0.9979636 -3.477323
> 44052  TaAffx.29281.1.S1_at -0.5103668 3.360355 -3.933707 0.0007090089
> 0.9979636 -3.488663
>
>> topTable(fit, coef = "ABSURD", p.value = 0.05)
> data frame with 0 columns and 0 rows
>
> Best!
>
> Samuel
>
>
> On Mon, May 21, 2012 at 12:22 PM, James W. MacDonald<jmacdon at uw.edu>  wrote:
>> Hi Sam,
>>
>>
>> On 5/21/2012 12:12 PM, Sam McInturf wrote:
>>> Hello Everyone,
>>>     I am working on expression profiling of wheat, where five tissue
>>> types were analyzed at anthesis (flowering) and 14 days after anthesis
>>> (DAA14).  Because of the amount of development that occurs during the
>>> first two weeks of grain filling, we expect to see differentially
>>> expressed genes (Code below). I approached this as a 5x2 factorial
>>> problem and took my lead from the limma guide (ch 8.7).  When calling
>>> topTable for any contrast, the list of P.Values contains a range of
>>> values (small small, some large).  While the adj.P.Val column contains
>>> the same (large) value for every gene reported ( up to n = 10,000).
>>>
>>> The adj.P.Vals that comeback for each contrasts range from 0.87 to
>>> 0.9999..
>>>
>>> Note the "ABSURD" contrast, when comparing different tissue types, we
>>> expect a lot of DE genes, but still no DE reported
>>>
>>> I subsetted my data to only analyze one tissue across the two times,
>>> and then doing the analysis as below, and if it is approached it as a
>>> 'comparison of two groups', and still was returned adjusted p values
>>> that were all the same, high value.
>>>
>>> Any idea why my q-values are all the same?
>>
>> Without any more information it is difficult to say. The stock answer would
>> be that you don't have enough power to detect any differences. I have seen
>> this on occasion, especially when there isn't much replication.
>>
>> You could help us by giving your targets file and an example of one of your
>> topTables.
>>
>>
>>
>>
>>> ################################################
>>> library(limma)
>>> library(affy)
>>> #read in our data
>>> target<- readTargets("--------/TargetCEL.txt")
>>> data<- ReadAffy(filenames = target$FileName)
>>>
>>> eset<- rma(data)
>>> #Building our model.  Let us model this as a 5x2 factorial problem
>>> (five tissues, two times)
>>> factors<- paste(target$Timing, target$Sample, sep = ".")
>>> factors<- factor(factors, levels = c("ANTH.SPIKE",  "ANTH.PEDUNCLE",
>>> "ANTH.STEM", "ANTH.FLAGLEAF",  "ANTH.NODES", "DAA14.SPIKE",
>>> "DAA14.PEDUNCLE", "DAA14.STEM", "DAA14.FLAGLEAF", "DAA14.NODES"))
>>> design<- model.matrix(~0+factors)
>>> colnames(design)<- levels(factors)
>>> contrastMat.TissueWise<- makeContrasts(
>>>                   Spike = DAA14.SPIKE - ANTH.SPIKE,
>>>                   PEDUNCLE = DAA14.PEDUNCLE - ANTH.PEDUNCLE,
>>>                   STEM = DAA14.STEM - ANTH.STEM,
>>>                   FLAGLEAF = DAA14.FLAGLEAF - ANTH.FLAGLEAF,
>>>                   NODES = DAA14.NODES - ANTH.NODES,
>>>                   ABSURD = ANTH.STEM - DAA14.SPIKE,
>>>                    levels = design
>>>                   )
>>>
>>> fit3<- lmFit(eset, design)
>>> fit2<- contrasts.fit(fit3, contrastMat.TissueWise)
>>> fit<- eBayes(fit2)
>>>
>>> topTable(fit2, coef = ___, n = 100)
>>
>> I am assuming this is a typo (fit2 rather than fit)?
>>
>> Best,
>>
>> Jim
>>
>>
>>> --
>>> Sam McInturf
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099