[BioC] to know about the reason in results obtained using DESeq and cufflinks

Mon Aug 30 16:10:06 CEST 2010

Hi

On 08/30/2010 03:03 PM, Aniket Vatsya wrote:
> Could you please tell me why there is large differnce in number of
> differntially expressed genes obtained from cufflinks and DESeq. I found
> nearly 3000 upregulated genes at FDR 5% using cufflinks whereas just found
> 50 upregulated genes at 10% using DESeq. I dont have any replicates.

I suppose, by 'cufflinks', you mean the 'cuffdiff' tool that comes with 
cufflinks.

The reason is that DESeq and cuffdiff address two apparently similar, 
but actually very different questions.

If you have two samples, cuffdiff tests, for each transcript, whether 
there is evidence that the concentration of this transcript is not the 
same in the two samples.

If you have two different experimental conditions, with replicates for 
each condition, DESeq tests, whether, for a given gene, the change in 
expression strength between the two conditions is large as compared to 
the variation within each replicate group.

This is a crucial difference. Imagine you had not replicates, just two 
samples, a control sample and one that was treated in some way. In the 
control sample, a certain gene has (after appropriate normalization) 100 
counts, and in the treatment sample, it has 130 counts. You might be 
tempted to conclude that the treatment causes this gene to be 
upregulated by 30%.

But now, image, you do your control experiment five times, and get 100 
counts, 120 count, 85 counts, 145 counts, and 129 counts. Now it becomes 
clear that 30% upregulation may well mean nothing at all but could 
easily be caused by just random differences in the samples that have 
nothing to do with the treatment.

This is why doing such experiments without any replicates is rather 
pointless. You simply need to know how much expression changes even if 
you try to keep the conditions constant.

cuffdiff is of course correct if it tells you that a change from 100 to 
130 counts is likely due to a real difference in transcript 
concentration between the two samples. However, this is unlikely to be 
the answer to your question, which presumably should be: In which genes 
does difference expression change _due_to_ the differences in treatment?

Hence, even if you had replicates, DESeq would give you much less hits 
than cufflinks.

Please read the DESeq package vignette or our paper to learn about the 
assumption of variance-mean dependence and what the "blind variance 
estimation" does that you seem to have used (as otherwise DESeq would 
have refused to process data without replicates).

   Simon