[BioC] bayseq and edgeR for multi groups comparisons

Thu Nov 17 00:38:29 CET 2011

Dear Laura,

Here is a reproducible example of edgeR code for finding genes that are DE 
between any of your cultivars, regardless of which cultivars:

nlibs <- 24
cultivar <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12))
design <- model.matrix(~cultivar)
y <- matrix(rnbinom(ngenes*nlibs,mu=50,size=10),ngenes,nlibs)
d <- DGEList(counts=y)
d <- estimateGLMCommonDisp(d,design)
d <- estimateGLMTrendedDisp(d,design)
d <- estimateGLMTagwiseDisp(d,design)
fit <- glmFit(d,design)
lrt <- glmLRT(d,fit,coef=2:12)
topTags(lrt)

This test performs a likelihood ratio test, similar to an F-test on 11 
degrees of freedom.

decideTestsDGE() however doesn't work when the contrast is on more than 1 
degree of freedom.  I'm not sure what to do about that yet, because 
decideTestsDGE shows the number of up and down genes, and the concept of 
up and down doesn't exist for F-like tests.

Best wishes
Gordon

> Date: Wed, 16 Nov 2011 11:08:47 +0100
> From: lpascual <Laura.pascual at avignon.inra.fr>
> To: bioconductor at r-project.org
> Subject: [BioC] bayseq and edgeR for multi groups comparisons
> Message-ID: <4EC38BAF.3050000 at avignon.inra.fr>
> Content-Type: text/plain
>
> Dear all,
>
>
> I am starting to analyze the results of a DGE experiment, and I have
> some doubts about how to perform multi-group comparisons. Our data set
> consists in just one developmental stage, but I have data from 12
> different genotypes (I have duplicates of the samples).
>
> I will like to identify all the differentially expressed genes among the
> different genotypes. I will like to know your opinion about edgeR and
> baySeq packages as both seam to be able to perform multi-group analysis.
>
> And I will also appreciate if you could give me any idea about the best
> way to design mi contrast hypothesis.
>
>
> *When working with edgeR I have employ this type of contrast matrix for
> my analysis*
>
>
> > cultivar <-
> factor(c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12))
>
> > design <- model.matrix(~cultivar)
>
> > design
>
> (Intercept) cultivar2 cultivar3 cultivar4 cultivar5 cultivar6 cultivar7
> …....
>
> 1 1 0 0 0 0 0 0
>
> 2 1 0 0 0 0 0 0
>
> 3 1 1 0 0 0 0 0
>
> 4 1 1 0 0 0 0 0
>
> 5 1 0 1 0 0 0 0
>
> 6 1 0 1 0 0 0 0
>
> 7 1 0 0 1 0 0 0
>
> 8 1 0 0 1 0 0 0
>
> 9 1 0 0 0 1 0 0
>
> 10 1 0 0 0 1 0 0
>
> :
>
> :
>
> :
>
> attr(,"assign")
>
> [1] 0 1 1 1 1 1 1 1 1 1 1 1
>
> attr(,"contrasts")
>
> attr(,"contrasts")$cultivar
>
> [1] "contr.treatment"
>
>
> *However I believe that will just help me to identify the genes that are
> differentially expressed in one of the cultivars with respect to the
> rest. Besides if I keep going doing all the contrast at the same time as
> people from the bioconductor mail list suggest me to extract the
> differentially expressed genes I get a mistake.*
>
>
> > glmfit.d2 <- glmFit(d2, design, dispersion = d2$common.dispersion)
>
> > names(glmfit.d2)
>
> [1] "coefficients" "fitted.values" "deviance" "df.residual"
>
> [5] "abundance" "design" "offset" "dispersion"
>
> [9] "method" "counts" "samples"
>
> > lrt.d2 <- glmLRT(d2, glmfit.d2, coef =2:12)
>
>
> > summary(decideTestsDGE(lrt.d2, adjust.method="BH", p.value=0.05))
>
> Error in array(x, c(length(x), 1L), if (!is.null(names(x)))
> list(names(x), :
>
> attempt to set an attribute on NULL
>
>
> *If I check just one of my possible contrasts it works*
>
>
> > lrt.d2test <- glmLRT(d2, glmfit.d2, coef =2)
>
> > summary(decideTestsDGE(lrt.d2test, adjust.method="BH", p.value=0.05))
>
> [,1]
>
> -1 875
>
> 0 14026
>
> 1 1460
>
>
> *However I think I am just getting genes whose expression is different
> in one of the cultivars, but not changing in the rest. And I will like
> just to get all the genes differentially expressed no matter in which
> cultivar or cultivars.*
>
>
> *I have also try to carry these analysis with baySeq package, and I have
> design this contrast hypothesis.*
>
>
> Slot "groups":
>
> $NDE
>
> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> $DE
>
> [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12
>
>
> *But, I believe with these approach I will not be able to identify genes
> that for example may have the same expression in cultivars 1,2,3,4 but
> different to the expression in cultivars 5,6,7,8. What do you think
> about it? Do you think I need to add model for each of the possible
> combinations? *
>
> *I will appreciate any suggestion that you could give me about the best
> way to analyze those data.*
>
>
> *Thanks in advance for your attention*
>
>
>
> *Laura Pascual*

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}