[BioC] help request. repost Define alternate design matrix in R

Gordon K Smyth smyth at wehi.EDU.AU
Wed May 11 06:31:04 CEST 2011


Dear Josquin,

The design matrix that you give is actually over-determined, so I don't 
think it could have come out of model.matrix() unless you've added columns 
to it later.

Are you seeking tissue-specific treatment effects, or do you want to look 
for treatment effects that are consistent across the two tissues?  I'll 
assume the latter.

If you want to get genes that are DE for Treat2 vs Treat1, and genes that 
are DE for Treat3 vs Treat1, you can proceed like this:

   design <- model.matrix(~Treat+Tissue)

I'll assume you have formed a DGEList object y, and have estimated the 
dispersion in some way using the above design matrix. Then:

   fit <- glmFit(y,design)
   lrt <- glmLRT(y,fit,coef=2)
   topTags(lrt)

gives you DE genes between Treat2 and Treat1, and

   lrt <- glmLRT(y,fit,coef=3)
   topTags(lrt)

gives you DE genes between Treat3 and Treat1.

Best wishes
Gordon


>> From: josquin.tibbits at dpi.vic.gov.au
>> Date: May 11, 2011 11:46:21 AM GMT+10:00
>> To: bioconductor at r-project.org
>> Subject: [BioC] help request. repost Define alternate design matrix in R
>>
>> Hi,
>> I am using EdgeR to find DE genes in an experiment with two factors
>> (Tissue (2 levels) and Treatment (3 levels). Currently the design matrix
>> output by model.matrix() looks like this:
>>
>>        Int     Tissue  Treat1  Treat2  Treat3
>> S1      1       0       0       0       1
>> S2      1       0       0       0       1
>> S3      1       0       0       1       0
>> S4      1       0       0       1       0
>> S5      1       0       1       0       0
>> S6      1       0       1       0       0
>> S7      1       1       0       0       1
>> S8      1       1       0       0       1
>> S9      1       1       0       1       0
>> S10     1       1       0       1       0
>> S11     1       1       1       0       0
>> S12     1       1       1       0       0
>> attr(,"assign")
>> [1] 0 1 2 2 2
>> attr(,"contrasts")
>> attr(,"contrasts")$Tissue
>> [1] "contr.treatment"
>> attr(,"contrasts")$Treat
>> [1] "contr.treatment"
>>
>> and the model is a comparison of this design and the model without the 
>> last column.
>>
>> I would like to specify a design matrix where a) I drop each of the 
>> Treat levels (in separate runs of EdgeR) to find the DE lists 
>> corresponding to each level of the Treatment and to b) specify a model 
>> where level1 of the
>>
>> Treatment is the reference against which the other levels are compared.
>>
>> I am stuck on how to specify the alternate design matrix and how to 
>> pass this to the model.matrix() command (or even if I have to do this). 
>> I am aware this is a general R command and have read the manual page 
>> and the manual pages for the contrasts() function.  My inability to 
>> make logical sense of these is no doubt testament to my lack of R 
>> expertise and some help would be appreciated.
>>
>> In reading the limma documentation it indicates that a reference can be 
>> specified by the option ref=" ". Would this also work for me e.g. 
>> model.matrix(targets, ref="Treat1")?
>>
>> For specifying the alternate contrasts would this also be achieved in a 
>> similar way to that shown in the limma manual by setting up a contrasts 
>> matrix e.g. contrast.matrix <- makeContrasts(Treat2,levels=design) to 
>> achieve also the contrast of Treat2 (still using Treat1 as a 
>> reference)?
>>
>> My next question is then the obvious, how is this then used in the 
>> testing. Do I specify it in the glmFit() or using the glmLRT() function 
>> on the original output from the glmFit but specifying the contrasts 
>> matrix as
>>
>> the option? e.g
>> glm1 <- glmFit(DGEList.object, design, dispersion = common.dispersion) #
>> fits the model
>> lrt1 <- glmLRT(DGEList.object, glm1) ## fits the original design contrast
>> lrt2 <- glmLRT(DGEList.object, glm1, contrast.matrix) ## fits the second
>> contrast
>>
>> or, am I way off track......
>> looking forward to being set straight.
>>
>> Josquin

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list