[BioC] LIMMA P-value calculations/Suggestions for flagged data

Thu Mar 22 12:49:31 CET 2007

> Date: Wed, 21 Mar 2007 16:04:31 -0400
> From: "Lance E. Palmer" <lance.palmer at stonybrook.edu>
> Subject: [BioC] LIMMA P-value calculations/Suggestions for flagged
> 	data
> To: bioconductor at stat.math.ethz.ch
>
> I just had a question/concern about P value calculations in Limma (I am
> using latest version of Bioconductor)
>
> I recently ran 3 arrays through my analysis.  The slides were analayzed
> with Genepix software.  There were a couple of genes that concerned me.
> One had a log fold change of -3.765.  The adjusted p-value (fdr)
> was .027.  I looked at the individual M values for the arrays and they
> were -0.009336, 0.09217 and -3.765.
>
> I noticed that the first two arrays had a 'not found' flag.  So
> basically the analysis gave a significant P-value using only 1 piece of
> data.  Is this something that is correct?

Yes, it is correct.  If there is only one data value with weight>0 for a particular probe, then
limma uses the empirical Bayes prior standard deviation for that probe to form a t-statistic.

Think of it this way.  You observed M=-3.765 for this probe.  That's a large negative value.  You
know from looking at the other probes that the standard deviation of M-values is usually around
0.03, say, so -3.7 is very likely genuinely different from zero.

> I also wonder if I should even remove 'not found' flagged data.
> Originally I did not, but someone suggested I do.  I originally did not
> remove it because of the case listed above.

I've argued on this mailing list and elsewhere for a long time that, rather than flagging faint
spots, it's better to use a better background correction method that avoids a blow out of M-values
at low intensities.

Best wishes
Gordon

> However, the case above tells us something about the experiments.  How
> do people deal with this situation?
>
> -Lance Palmer