[BioC] Expected number of DE genes?

Thomas H. Hampton Thomas.H.Hampton at dartmouth.edu
Wed Jul 16 01:52:13 CEST 2014


Dear Jessica,

The answer to this sort of question depends on the question you are actually asking. 

You no doubt know a) that in this sort of test many genes are expected to reach 
nominal significance by chance and b) that software packages include algorithms 
intended to differentiate between these predicted "false positives" and "true positives".

So maybe you are asking : do these strategies work? 

The short answer is that they definitely do not work perfectly for every experiment or for every gene in any experiment.

I look at a lot of data where effects are pretty subtle and the usual sort of multiple hypothesis testing corrections suggest that
there are *no* differentially expressed genes in my data.

I have yet to run into a situation where follow up measurements and experiments validated the assertion that *no* genes were 
in fact differentially expressed. 

If I were in your shoes, as I have been on numerous occasions, I would do at least some of the following.

1) Establish that samples from your experimental groups are more like samples from the same group than 
they are like samples from other groups using comparisons like PCA, multi dimensional scaling or hierarchical clustering.

2) Consider your a priori knowledge of genes that you thought would go some place. Did they go in the directions you thought they 
should go?

3) Use qPCR or some low throughput method on single genes to reassure yourself that genes of interest (say genes with a fold change 
of 2 and a nominal, unadjusted p value of 0.05) are in fact regulated, even though multiple hypothesis testing correction suggests that
they may be the result of chance.

Good luck,

Tom


On Jul 15, 2014, at 5:54 PM, Jessica Perry Hekman wrote:

> I'm getting only a few dozen differentially expressed genes when I analyze my RNA-Seq data with DESeq2 (79) and EdgeR (34) (even fewer when I use EBSeq). I had expected many more -- hundreds or even a thousand. If this is the real answer, I'm fine with it, but I'm concerned that I'm doing something wrong. What are the ranges of numbers of differentially expressed genes that one would expect from DESeq2 or EdgeR?
> 
> More information:
> 
> I'm in the midst of my first RNA-seq project (as many of you have probably surmised from my frequent postings to a variety of lists). My initial goal is to get a list of differentially expressed (DE) genes.
> 
> I have 24 samples, 12 from each of 2 treatment groups.
> 
> My species is fox (Vulpes vulpes), which aligns very nicely to dog (Canis familiaris).
> 
> My current approach is to use the dog reference genome (to which my fox reads align at about 83%) + GTF with location of exons.
> 
> Can I feel confident about DESeq2 and EdgeR's calls?
> 
> Thanks very much for any insights,
> 
> Jessica
> 
> -- 
> Jessica P. Hekman, DVM, MS
> PhD student, University of Illinois, Urbana-Champaign
> Animal Sciences / Genetics, Genomics, and Bioinformatics
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list