[BioC] Deseq for pairwise comparison among many pairs

Tue Jul 22 13:44:19 CEST 2014

hi Avichai,

On Mon, Jul 21, 2014 at 11:18 AM,  <avichai.amrad at ips.unibe.ch> wrote:
> Hi Michael,
>
> I have a dataset of five different conditions with three repeats each and my aim is to conduct a few different pairwise comparison between the different conditions. I followed successfully your instructions from Wed Jun 11 15:32:11 about how to do it in DESeq 2.
> However, while analyzing the results I saw some strange numbers where big differences did not get significant Padj- when I repeated the analysis with uploading only the relevant conditions the Padj dropped dramatically.
>

This can happen when the variance of counts for the two conditions is
much smaller than the variance for the other conditions. The DESeq
model has a single dispersion estimate for each gene, so the other
conditions can pull up the dispersion estimate (both for a singel gene
and for the trend line, so pulling up all the estimates).

In this case it might make sense to analyze the conditions separately.
You might find it useful to examine the PCA plot for these samples,
which may or may not show the global differences in variances.

Mike

> One example:
>
> *        pairwise comparison from the big data set - normalized reads of condition a: 216,241; 204,221; 229,866. condition b: 11,192; 10,840; 9,172 Padj: 0.132
>
> *        comparison in a dataset of the relevant conditions only: normalized reads of condition a: 220,126; 205,579; 230,317. condition b: 11,341; 10,897; 9,233 Padj: 1.74E-215
>
> Can you think of what might have caused this problem?
> I'm wondering if I should go back and analyze each comparison separately.
>
> The basic script I have used:
>
> sampleFiles <- c("P11.txt","P12.txt","P13.txt","P21.txt","P22.txt","P23.txt","L11.txt","L12.txt","L13.txt","L21.txt","L22.txt","L23.txt","L31.txt","L32.txt","L33.txt")
> sampleCondition<-c("a","a","a","e","e","e","ILa","ILa","ILa","ILe","ILe","ILe","ILh","ILh","ILh")
> sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
> ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
> colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("a","e","ILa","ILe","ILh"))
> dds<-DESeq(ddsHTSeq)
> res<-results(dds, contrast=c("condition","a","ILa"))
> res<-res[order(res$padj),]
> mcols(res,use.names=TRUE)
> write.csv(as.data.frame(res),file="a-ILa.csv")
>
> All the best,
>
> Avichai
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor