[BioC] Expected number of DE genes?

Wed Jul 16 02:15:31 CEST 2014

Thanks, Tom. Yes, you summarized my dilemma well, although I am more 
concerned with false negatives right now than false positives (as we do 
intend to do PCR to validate any positives we get, but any false 
negatives lost are lost forever :).  Ryan also suggested PCA and I'll 
definitely be trying that next. But getting the feedback that others 
have used these techniques and feel moderately confident in them (with 
followup work) is very helpful.

Thanks,
Jessica

Jessica P. Hekman, DVM, MS
PhD student, University of Illinois, Urbana-Champaign
Animal Sciences / Genetics, Genomics, and Bioinformatics

On 07/15/2014 06:52 PM, Thomas H. Hampton wrote:
> Dear Jessica,
>
> The answer to this sort of question depends on the question you are actually asking.
>
> You no doubt know a) that in this sort of test many genes are expected to reach
> nominal significance by chance and b) that software packages include algorithms
> intended to differentiate between these predicted "false positives" and "true positives".
>
> So maybe you are asking : do these strategies work?
>
> The short answer is that they definitely do not work perfectly for every experiment or for every gene in any experiment.
>
> I look at a lot of data where effects are pretty subtle and the usual sort of multiple hypothesis testing corrections suggest that
> there are *no* differentially expressed genes in my data.
>
> I have yet to run into a situation where follow up measurements and experiments validated the assertion that *no* genes were
> in fact differentially expressed.
>
> If I were in your shoes, as I have been on numerous occasions, I would do at least some of the following.
>
> 1) Establish that samples from your experimental groups are more like samples from the same group than
> they are like samples from other groups using comparisons like PCA, multi dimensional scaling or hierarchical clustering.
>
> 2) Consider your a priori knowledge of genes that you thought would go some place. Did they go in the directions you thought they
> should go?
>
> 3) Use qPCR or some low throughput method on single genes to reassure yourself that genes of interest (say genes with a fold change
> of 2 and a nominal, unadjusted p value of 0.05) are in fact regulated, even though multiple hypothesis testing correction suggests that
> they may be the result of chance.
>
> Good luck,
>
> Tom
>
>
> On Jul 15, 2014, at 5:54 PM, Jessica Perry Hekman wrote:
>
>> I'm getting only a few dozen differentially expressed genes when I analyze my RNA-Seq data with DESeq2 (79) and EdgeR (34) (even fewer when I use EBSeq). I had expected many more -- hundreds or even a thousand. If this is the real answer, I'm fine with it, but I'm concerned that I'm doing something wrong. What are the ranges of numbers of differentially expressed genes that one would expect from DESeq2 or EdgeR?
>>
>> More information:
>>
>> I'm in the midst of my first RNA-seq project (as many of you have probably surmised from my frequent postings to a variety of lists). My initial goal is to get a list of differentially expressed (DE) genes.
>>
>> I have 24 samples, 12 from each of 2 treatment groups.
>>
>> My species is fox (Vulpes vulpes), which aligns very nicely to dog (Canis familiaris).
>>
>> My current approach is to use the dog reference genome (to which my fox reads align at about 83%) + GTF with location of exons.
>>
>> Can I feel confident about DESeq2 and EdgeR's calls?
>>
>> Thanks very much for any insights,
>>
>> Jessica
>>
>> --
>> Jessica P. Hekman, DVM, MS
>> PhD student, University of Illinois, Urbana-Champaign
>> Animal Sciences / Genetics, Genomics, and Bioinformatics
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>