[BioC] edgeR design matrix, one group vs average of other groups

Wed Mar 12 07:20:38 CET 2014

Hi Georg,

Gordon Smyth gave a quite comprehensive answer to this and similar 
issues a little while ago in answer to one of my questions. Here are the 
links to the relevant posts:

http://permalink.gmane.org/gmane.science.biology.informatics.conductor/52714
http://permalink.gmane.org/gmane.science.biology.informatics.conductor/52752

-Ryan

On 3/11/14, 10:54 AM, Georg Otto wrote:
> Dear Bioconductors,
>
> I am working on RNA-seq data with multiple experimental factors and I am
> trying to reproduce the edgeR manual, chapter 3.2.3, GLM approach.
>
>
>> design <- model.matrix(~0+group, data=y$samples)
>> colnames(design) <- levels(y$samples$group)
>> design
>    		A	B	C
> sample.1	1	0	0
> sample.2	1 	0 	0
> sample.3 	0 	1 	0
> sample.4 	0	1	0
> sample.5 	0 	0 	1
>
>> fit <- glmFit(y, design)
>
> I want to know which genes are differentially expressed in C compared to
> the other groups, so I chose to compare C to the average of A and B
>
>> lrt <- glmLRT(fit, contrast=c(-0.5,-0.5,1))
>
> Alternatively I could put A and B in a single group
>
>> design
>    		A.B	C
> sample.1	1	0
> sample.2	1 	0
> sample.3 	1 	0
> sample.4 	1	0
> sample.5 	0 	1
>
>> fit <- glmFit(y, design)
> an compare C to A.B
>
>> lrt <- glmLRT(fit, contrast=c(-1,1))
>
> When I try this with my own data, the first approach gives me many more
> differentially expressed genes than the second one, but the second gene
> set is a subset of the first one. I would be very grateful if somebody
> could explain to me what is the difference between the approaches, and
> which one is the more appropriate for my purpose (find genes specific
> for condition C)
>
> Best wishes,
>
> Georg
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] compiler_3.0.1 tools_3.0.1
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor