[BioC] DESeq vs DESeq2 have different DEGs results

Michael Love michaelisaiahlove at gmail.com
Fri May 16 14:35:41 CEST 2014


hi Catalina,

If you want to test for the treatment effect and control the subject
effect you should use the reduced design of ~ subject.

In the DESeq2 vignette we say: "the user provides the full formula
(the formula stored in design(dds)), and a reduced formula, e.g. one
which does not contain the variable of interest."

For your filtering for the old version of DESeq, how did you chose the
value of theta=0.2? Note that such filtering in DESeq2 is accomplished
automatically by results(), and the function optimizes for the best
value of theta (most adjusted p-values less than a threshold). So you
do not need to pre-filter.

Mike


On Fri, May 16, 2014 at 1:42 AM, Catalina Aguilar Hurtado
<catagui at gmail.com> wrote:
> Hi I want to compare DESeq vs DESeq2 and I am getting different number of
> DEGs which I will expect to be normal. But when I compare the 149 genes iD
> that I get with DESeq with the 869 from DESeq2 there are only ~10 genes
> that are in common which I don\'92t understand  (using FDR <0.05 for both).
> I want to block the Subject effect for which I am including the reduced
> formula of ~1.
>
> Shouldn\'92t this two methods output similar results?  Because at the
> moment I could interpret my results with different ways.\
> \
> Thanks for your help,\
> \
> Catalina\
> \
> \
> This the DESeq script that I am using:\
> \
> \
> DESeq\
> \
> library(DESeq)\
> \
> co=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",",
> row.names=1))\
> \
> \
> Subject=c(1,2,3,4,5,1,2,4,5)\
> \
> Treatment=c(rep("co",5),rep("c2",4))\
> a.con=cbind(Subject,Treatment)\
> \
> cds=newCountDataSet(co,a.con)\
> \
> \
> cds <- estimateSizeFactors( cds)\
> \
> cds <- estimateDispersions(cds,method="pooled-CR",
> modelFormula=count~Subject+Treatment)\
> \
> \
> #filtering\
> \
> rs = rowSums ( counts ( cds ))\
> theta = 0.2\
> use = (rs > quantile(rs, probs=theta))\
> table(use)\
> cdsFilt= cds[ use, ]\
> \
> \
> \
> fit0 <- fitNbinomGLMs (cdsFilt, count~1)\
> \
> fit1 <- fitNbinomGLMs (cdsFilt, count~Treatment)\
> \
> pvals <- nbinomGLMTest (fit1, fit0)\
> \
> \
> padj <- p.adjust( pvals, method="BH" )\
> \
> padj <- data.frame(padj)\
> \
> row.names(padj)=row.names(cdsFilt)\
> \
> padj_fil <- subset (padj,padj <0.05 )\
> \
> dim (padj_fil)\
> \
> [1] 149   1\
> \
> \
> ------------
> \
> library ("DESeq2")\
> \
> countdata=as.matrix(read.table("2014_04_01_6h_LP.csv",header=T, sep=",",
> row.names=1))\
> \
> coldata= read.table ("targets.csv", header = T, sep=",",row.names=1)\
> \
> coldata\
> \
>>Subject Treatment\
>>F1       1        co\
>>F2       2        co\
>>F3       3        co\
>>F4       4        co\
>>F5       5        co\
>>H1       1        c2\
>>H2       2        c2\
>>H4       4        c2\
>>H5       5        c2\
> \
> dds <- DESeqDataSetFromMatrix(\
>   countData = countdata,\
>   colData = coldata,\
>   design = ~ Subject + Treatment)\
> dds\
> \
> dds$Treatment <- relevel (dds$Treatment, "co")\
> \
> \
> dds <- estimateSizeFactors( dds)\
> \
> dds <- estimateDispersions(dds)\
> \
> \
> rs = rowSums ( counts ( dds ))\
> theta = 0.2\
> use = (rs > quantile(rs, probs=theta))\
> table(use)\
> ddsFilt= dds[ use, ]\
> \
> \
> dds <- nbinomLRT(ddsFilt, full = design(dds), reduced = ~ 1)\
> \
> resLRT <- results(dds)\
> \
> sum( resLRT$padj < 0.05, na.rm=TRUE )\
> \
> #[1] 869}
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list