[BioC] Expected number of DE genes?

Wed Jul 16 00:23:48 CEST 2014

Hi Jessica,

Have you done some exploratory analysis on your dataset? A good place 
to start would be to generate PCA plots using plotMDS for edgeR and 
plotPCA for DESeq2. Do your samples cluster into two groups as expected 
on the PCA plots? Secondly, you should have a look at the dispersion 
estimates from both methods and compare them to typically observed 
values. You can do this with plotBCV for edgeR and plotDispEsts for 
DESeq2 (but remember that BCV is the square root of dispersion, so pay 
attention to the y-axis label). If your dispersions are too high, this 
indicates that the variation within groups is large, which means that 
detecting significant differences between groups is difficult and you 
will get fewer genes. The edgeR User's Guide says in section 2.10 that 
typical BCV values are 0.1 for genetically identical animals (e.g. lab 
mice) and 0.4 for human samples. The latter value is probably closer to 
what you should expect. If your BCV is a lot higher than that, your 
experiment may not be well-controlled, or there may be some other 
problem in the methods or the data that you need to track down.

Home this helps,

-Ryan Thompson.

On Tue Jul 15 14:54:47 2014, Jessica Perry Hekman wrote:
> I'm getting only a few dozen differentially expressed genes when I
> analyze my RNA-Seq data with DESeq2 (79) and EdgeR (34) (even fewer
> when I use EBSeq). I had expected many more -- hundreds or even a
> thousand. If this is the real answer, I'm fine with it, but I'm
> concerned that I'm doing something wrong. What are the ranges of
> numbers of differentially expressed genes that one would expect from
> DESeq2 or EdgeR?
>
> More information:
>
> I'm in the midst of my first RNA-seq project (as many of you have
> probably surmised from my frequent postings to a variety of lists). My
> initial goal is to get a list of differentially expressed (DE) genes.
>
> I have 24 samples, 12 from each of 2 treatment groups.
>
> My species is fox (Vulpes vulpes), which aligns very nicely to dog
> (Canis familiaris).
>
> My current approach is to use the dog reference genome (to which my
> fox reads align at about 83%) + GTF with location of exons.
>
> Can I feel confident about DESeq2 and EdgeR's calls?
>
> Thanks very much for any insights,
>
> Jessica
>