[BioC] Expected number of DE genes?

Jessica Perry Hekman hekman2 at illinois.edu
Thu Jul 17 03:53:09 CEST 2014


Thanks, Simon. I did expect a fair amount of noise, and I think a lot of 
my surprise came from the fact that I expected the noise to be expressed 
as a large number of positives. Your explanation helps adjust my 
perspective.

Jessica


On 07/16/2014 01:12 AM, Simon Anders wrote:
> Dear Jessica
>
> On 16/07/14 02:15, Jessica Perry Hekman wrote:
>> Thanks, Tom. Yes, you summarized my dilemma well, although I am more
>> concerned with false negatives right now than false positives (as we do
>> intend to do PCR to validate any positives we get, but any false
>> negatives lost are lost forever :).
>
> I wonder whether you might have fallen for a fundamental but quite
> common misunderstanding here, because false positives and false
> negatives are not treated equal in a hypothesis test.
>
> In both edgeR and DESeq, you choose a false discover rate (FDR); in the
> examples of the vignette, we use 10%, but this is by no way the only
> useful value. This means that you ask DESeq2 to give you a list of genes
> that are differentially expressed and that this list should not contain
> more than 10% false positives, and that you are willing to accept as
> many false negatives as it takes to ensure that.
>
> More succinctly: If a gene is not called significant, this does not mean
> that the algorithm thinks that it is not differentially expressed but
> merely that it cannot say whether it is.
>
>
> One other important issue is: What does "significantly diferentially
> expressed" actually mean? In biological systems, all components are so
> highly interconnected that is seems implausible to think that there are
> any genes which are not at all affected by your treatment, not even
> slightly. I would argue that, in typical experiments, most if not all
> genes change their expression strength at least a tiny bit in reaction
> to treatment. The question is whether the difference that you observe
> between the mean expression in treatment and control samples is driven
> by this reaction to treatment, or whether it is mainly driven by random
> fluctuations, i.e., by those differences that you also see when
> comparing samples treated the same way (replicates). When the random
> noise has the stronger effect, then the observed difference (log fold
> change) will be in a random direction and may or may not be in the
> direction that the treatment has affected the gene.
>
> Hence, my (somewhat personal) opinion on what a significant p value
> means in DE analysis, namely: We got the sign right.
>
> A significant call means that we can have confidence in the observed
> direction of the change. The effect of treatment on this gene was strong
> enough that we can say with confidence whether the gene reacted with up-
> or with down-regulation.
>
> Hence, if you see less DE genes than expected, this means: The effect of
> your treatment was too weak to be be seen against the noise from random
> sample-to-sample variation (or equivalently: the variation within
> treatment groups was too strong and drowned the treatment signal). It
> does not mean that there was no effect.
>
>
> To judge whether your results are typical, you would need to tell us
> more about your experiment.
>
>    Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list