[BioC] Expected number of DE genes?
Simon Anders
anders at embl.de
Wed Jul 16 08:12:02 CEST 2014
Dear Jessica
On 16/07/14 02:15, Jessica Perry Hekman wrote:
> Thanks, Tom. Yes, you summarized my dilemma well, although I am more
> concerned with false negatives right now than false positives (as we do
> intend to do PCR to validate any positives we get, but any false
> negatives lost are lost forever :).
I wonder whether you might have fallen for a fundamental but quite
common misunderstanding here, because false positives and false
negatives are not treated equal in a hypothesis test.
In both edgeR and DESeq, you choose a false discover rate (FDR); in the
examples of the vignette, we use 10%, but this is by no way the only
useful value. This means that you ask DESeq2 to give you a list of genes
that are differentially expressed and that this list should not contain
more than 10% false positives, and that you are willing to accept as
many false negatives as it takes to ensure that.
More succinctly: If a gene is not called significant, this does not mean
that the algorithm thinks that it is not differentially expressed but
merely that it cannot say whether it is.
One other important issue is: What does "significantly diferentially
expressed" actually mean? In biological systems, all components are so
highly interconnected that is seems implausible to think that there are
any genes which are not at all affected by your treatment, not even
slightly. I would argue that, in typical experiments, most if not all
genes change their expression strength at least a tiny bit in reaction
to treatment. The question is whether the difference that you observe
between the mean expression in treatment and control samples is driven
by this reaction to treatment, or whether it is mainly driven by random
fluctuations, i.e., by those differences that you also see when
comparing samples treated the same way (replicates). When the random
noise has the stronger effect, then the observed difference (log fold
change) will be in a random direction and may or may not be in the
direction that the treatment has affected the gene.
Hence, my (somewhat personal) opinion on what a significant p value
means in DE analysis, namely: We got the sign right.
A significant call means that we can have confidence in the observed
direction of the change. The effect of treatment on this gene was strong
enough that we can say with confidence whether the gene reacted with up-
or with down-regulation.
Hence, if you see less DE genes than expected, this means: The effect of
your treatment was too weak to be be seen against the noise from random
sample-to-sample variation (or equivalently: the variation within
treatment groups was too strong and drowned the treatment signal). It
does not mean that there was no effect.
To judge whether your results are typical, you would need to tell us
more about your experiment.
Simon
More information about the Bioconductor
mailing list