[BioC] DESeq and EdgeR log fold differences

Fri Nov 30 03:15:53 CET 2012

Dear Ioannis,

Questions like this require reproducible code examples that other people 
can run for themselves to confirm the results.  See the posting guide.

As it is, the inconsistencies in the three sets of output you give 
(samples in different orders, groups with different names, same miR with 
different IDs, logFC of different signs, etc) do not give a reader any 
confidence that the analyses are actually comparable.  According to the 
DESeq output you give, this particular miR is much lower in conditionA 
(wildtype) than in conditionB (knockout), but the raw data you give show 
very clearly that the opposite is true.

It may be that you have simply given different data to the two packages or 
else have aligned the results incorrectly.

Best wishes
Gordon

---------------- original message -----------------
[BioC] DESeq and EdgeR log fold differences
Ioannis Vlachos iv at on.gr
Thu Nov 29 23:06:17 CET 2012

Hello everyone,

I thought of conducting a parallel DE analysis with EdgeR and DESeq using 
a dataset that I have been working on lately.

The dataset has two conditions with two biological replicates each.

Let's say: Wild Type, Wild Type, Knock Out, Knock Out.

It's a smallRNA-Seq dataset, mapped to miRNAs.

I have tried various analyses using both programs and I have noticed this.

There are very large differences in fold changes for some miRNAs between 
the two programs, even when using "RLE" for EdgeR normalization.

Example:

DESeq Code:

countDataSet = newCountDataSet (DATA, condition)
countDataSet = estimateSizeFactors(countDataSet)
  countDataSet = estimateDispersions(countDataSet)
  difexp = nbinomTest (countDataSet, "WildType", "KnockOut")

one of the results is:

id	baseMean	baseMeanA	baseMeanB	foldChange
log2FoldChange	pval	padj
100	623.8597966	349.3576527	898.3619406	2.57146776
1.362592066	0.001303353	0.182310802

And the size factors for DESeq are:

sizeFactors(countDataSet)

       KO1       KO2       WT1       WT2
1.2969960 1.052 0.8850 0.84442

OK. So far so good.

EdgeR now.

dge <- DGEList(counts=DATA, group=condition)
dge<- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, verbose=TRUE)
dge <-estimateTagwiseDisp(dge, verbose=TRUE)
et<- exactTest(dge)

Which results in:
 	logFC	logCPM	PValue		FDR
100	-2.750	5.814103	9.40E-09	1.20E-05

With:
dge$samples
     group lib.size norm.factors
WT1     1  2796302    0.9922204
WT2     1  2610244    0.9928183
KO1     2  3999488    1.0248098
KO2     2  3349646    0.9905555

We have logFC 1.3 for DESeq and 2.75 in EdgeR

And these results remain practically the same even by using:

dge<- calcNormFactors(dge.RLE, method="RLE")

                      logFC    logCPM       PValue          FDR
210    -2.775952  5.823856 8.047631e-09 1.030902e-05

        group lib.size norm.factors
KnockOut  3999488    1.0147156
KnockOut  3349646    0.9830163
WildType  2796302    0.9903844
WildType  2610244    1.0122579

Any thoughts?

This entry has (raw tags):
KO	KO	WT	WT
  131	123	287	195

Any thoughts on why I get 1.3 lFC vs 2.7lFC?

Thanks a lot,

Best Regards,

Ioannis

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}