[BioC] Experimental design with edgeR and DESeq packages (RNA-seq)

Sat Nov 17 07:42:27 CET 2012

> Date: Thu, 15 Nov 2012 12:09:10 +0100
> From: Yvan Wenger <yvan.wenger at unige.ch>
> To: bioconductor at r-project.org
> Subject: [BioC] Experimental design with edgeR and DESeq packages (RNA-seq)
>
> Hi everybody,
>
> I just started using edgeR and DESeq and am looking for a confirmation 
> that I am not doing a silly thing.
>
> Basically, we have 7 conditions and for only 2 of these sample we have 
> biological triplicates. Let us say that the samples are "A", "A", "A", 
> "B", "C" (most of the genes are NOT regulated in my experiment). 
> Finally, let us say we just want to compare "B" to "C", but using all 
> the information available. Can we use all the dataset for estimating the 
> common and tagwise dispersion? Typically using the commands (note that I 
> compare here "B" to "C", thus samples without replicates).
>
> edgeR:
> countTable=read.table('mytable',header=F,row.names=1) ; dge <- 
> DGEList(counts=countTable,group=c("A","A","A,"B","C")) ; dge <- 
> calcNormFactors(dge) ; dge <- estimateCommonDisp(dge) ; dge <- 
> estimateTagwiseDisp(dge) ; et <- exactTest(dge, pair=c("B","C"))

Yes, this is a perfectly standard analysis.  edgeR estimates the genewise 
dispersion values from the three replicates for Group A and uses these 
dispersions even though you are comparing B to C.

The assumption here is obviously that A, B and C are similar populations, 
so that genes with higher biological coefficient of variation (BCV) in 
condition A also tend to have higher BCV in conditions B and C as well.

Gordon

> or
>
> DESeq:
> countTable = read.table('mytable.csv', header=F,row.names=1) ; design
> = data.frame(row.names = colnames(countTable),condition =
> c("A","A","A,"B","C")) ; condition =
> design$condition;cds=newCountDataSet(countTable,condition);
> cds=estimateSizeFactors(cds);cds=estimateDispersions(cds);
> res=nbinomTest(cds,"B","C")
>
> Is it ok to do so (to use samples not compared in the end to estimate 
> the dispersion) Does this correspond to the example "working partially 
> without replicates" from the DESeq manual) ? Or should I just consider 
> that there is no replicates for sample B and C and proceed by ignoring 
> other samples completely ?
>
> Many thanks !
>
> Yvan

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}