[BioC] limma: instances of highly variable paired ratios, but very small p-values

Gordon K Smyth smyth at wehi.EDU.AU
Mon Nov 19 01:06:05 CET 2012


Dear Ryan,

Thanks, this is much clearer now.

There isn't any contradiction in the results you give, and the results are 
essentially as I would want them to be.

The aim of the moderated t-test is to test whether the log-ratios are 
consistently greater or lesser than zero, not to test whether the 
log-ratios are equal.  For each of the three genes you give, the three 
pairwise log-ratios are consistently greater than or less than zero.  For 
each gene, the three log-ratios are of the same sign, some quite large and 
none very small.  Hence we would conclude that the treatment (sleep 
duration) does have a worthwhile effect for these genes.

Note that the moderated t-test does not penalize a gene for having a large 
variance as much as the ordinary t-test would do.  Compared to the 
ordinary t-test, the moderated t-test gives more weight to the treatment 
fold change.  It can therefore be viewed as a compromise between the 
ordinary t-test and ranking by fold-change.

To see whether the compromise is being drawn, look at fit$df.prior.  If 
this equal to 3 (the residual df for you experiment), then the moderated 
t-test is giving equal weight to the genewise variance and to the global 
variance.  If df.prior is much larger than 3, then the moderated t-test is 
using mostly the global variance.  The larger the df.prior, the closer the 
moderated t-test will come to ranking genes by fold change.  The smaller 
the df.prior, the closer the moderated t-test will come to ranking by 
ordinary t-test.  Larger genewise variances are penalized more when 
df.prior is low and less when df.prior is large.

I don't see a strong reason to suspect a problem in your analysis, but I 
should say that I do not necessarily recommend filtering by variance in 
conjunction with the eBayes procedure.  There is a risk that, by removing 
genes with small overall variances, you are changing the natural 
properties of the data, leading limma to think that the variances are more 
consistent than they actually are, and leading limma to set df.prior 
larger than would be ideal.  You will almost certainly find that, if you 
remove the filter-by-variance step, then limma will choose a lower value 
for df.prior than at present.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Sun, 18 Nov 2012, Ryan Basom wrote:

> Dear Gordon,
>
> I apologize about the lack of clarity in my initial post.  In my
> experiment, I'm working with three pairs of twins that differ in sleep
> duration.  In my targets file, the twins are paired by "Source," and are
> grouped by sleep duration - either "short" or "long."  I want try to
> identify significantly differentially expressed probes between the "short"
> and "long" groups.
>
>> targets
>   Sample Group Source
> 1 short.A short      A
> 2 short.B short      B
> 3 short.C short      C
> 4  long.A  long      A
> 5  long.B  long      B
> 6  long.C  long      C
>
>
> "d" is a data.frame that contains quantile normalized, log2 scaled signal
> intensity values, with samples represented in columns.  The data set has
> undergone signal intensity and variance filtering steps, resulting in
> 13,597 rows of values that were used with limma.
>
>> head(d)
>           long.A    long.B    long.C  short.A  short.B  short.C
> 1690577  9.717804 10.068060  9.969054 11.74844 10.84940 12.46522
> 2120040  9.234160  9.415976  9.463955 10.95052 10.42986 11.84757
> 3180438  9.724018  9.370251  9.758478 11.32368 10.51315 11.97926
> 3130324  8.980456  9.163279  9.203548 10.57555 10.13772 11.71771
> 6380255  9.624958  9.118592  9.364389 11.02301 10.20351 11.85628
> 6560164 10.097376  9.852433 10.089217 11.55226 10.78285 12.20962
>
>
> design<-model.matrix(~targets$Source+targets$Group)
> fit <- lmFit(d, design)
> fit <- eBayes(fit)
> results<-topTable(fit, coef="targets$Groupshort", number=nrow(d))
>
> What I'm puzzled about are instances of probes with very small p-values,
> when the sample paired ratios vary quite a bit.  Below are a few examples.
>
>> exampleProbes<-c(1450390,1230376,3420451)
>> results[results$ID %in% exampleProbes,]
>        ID     logFC   AveExpr         t      P.Value    adj.P.Val        B
> 13 1450390 -1.373309 10.210502 -9.807005 4.353149e-17 4.553059e-14 28.26206
> 14 1230376  1.439599 10.006936  9.785405 4.905855e-17 4.764637e-14 28.14655
> 18 3420451 -1.258524  8.028813 -8.903744 6.262297e-15 4.730470e-12 23.45921
>
>
>> signalIntensities<-d[row.names(d) %in% exampleProbes,]
>> signalIntensities
>          long.A    long.B    long.C   short.A   short.B   short.C
> 1450390 9.561842  9.508576  9.501124 10.839129 10.179997 11.672343
> 1230376 9.108591 10.499186 12.572430  8.752280  8.271599 10.837532
> 3420451 7.534823  7.447136  7.216693  8.519415  8.080696  9.374113
>
>> ratios<-signalIntensities[1:3]-signalIntensities[4:6]
>> ratios
>            long.A     long.B    long.C
> 1450390 -1.2772877 -0.6714210 -2.171219
> 1230376  0.3563107  2.2275875  1.734898
> 3420451 -0.9845919 -0.6335606 -2.157420
>
>
> Thank you very much for any assistance with this.
>
> Best,
>
> Ryan
>
>
>
>
> On Fri, Nov 16, 2012 at 10:36 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Ryan,
>>
>> The limma algorithm is very well understood by now, and there are many
>> bioinformatications at the FHCRC who could probably answer your question.
>>
>> I find it hard to make a response to your email because I just see a
>> jumble of numbers.  I don't have a firm understanding of what your
>> experimental design is, what model you've fitted, what the numbers are that
>> you've calculated, or what you want me to see.  To comment more, I'd need
>> to see your experimental design and the complete pipeline of commands used
>> to generate the output given.
>>
>> Best wishes
>> Gordon
>>
>> I don't a firm idea of what your experimental design
>>
>>  Date: Wed, 14 Nov 2012 17:23:43 -0800
>>> From: Ryan Basom <rbasom at gmail.com>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] limma: instances of highly variable paired ratios, but
>>>         very small p-values
>>>
>>> Hi,
>>>
>>> I've performed a limma analysis on paired samples that were run on
>>> Illumina HT12 arrays, with three replicates in each condition.  I'm a bit
>>> troubled by the results though, as there are several probes that have very
>>> small adjusted p-values, though when looking at the paired ratio values,
>>> they vary quite a bit.  Here are a few examples where the comparison is
>>> long-short, and the samples are paired by the letters A,B,C.  After the
>>> adj.P.Val column, I've calculated the paired sample ratio values, these
>>> three columns are followed by the signal intensities from each sample:
>>>
>>>    ProbeID TargetID logFC AveExpr P.Value adj.P.Val long-short.A
>>> long-short.B long-short.C long.A long.B long.C short.A short.B short.C
>>> 1450390 RPL17 -1.3733092649 10.2105020267 4.35314891863083e-17
>>> 4.55305891127872e-14 -1.277287712 -0.6714209686 -2.1712191142 9.5618416199
>>> 9.5085763086 9.5011242541 10.8391293319 10.1799972771 11.6723433683 1230376
>>> ALAS2 1.4395987013 10.0069363572 4.9058551374517e-17 4.76463659313792e-14
>>> 0.356310701 2.2275874983 1.7348979044 9.1085909827 10.4991863428
>>> 12.5724297981 8.7522802817 8.2715988445 10.8375318936 3420451 RSL24D1
>>> -1.2585240828 8.0288125742 6.26229691539969e-15 4.73046950881609e-12
>>> -0.9845918613 -0.6335605827 -2.1574198045 7.5348228063 7.4471358372
>>> 7.2166929547 8.5194146676 8.0806964199 9.3741127592
>>>
>>>
>>> I'd assumed that limma would have been more sensitive to this and am
>>> wondering if anyone could please explain why this may be occurring.
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>> __
>>> Ryan Basom
>>> Systems Analyst/Programmer
>>> Genomics Resource
>>> Fred Hutchinson Cancer Research Center

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list