[BioC] model.matrix

Gordon K Smyth smyth at wehi.EDU.AU
Fri Mar 7 01:12:59 CET 2014

Dear Mike,

> Mike Miller mike.bioc32 at gmail.com
> Thu Mar 6 12:27:01 CET 2014
> Dear All,
> This question is regarding model.matrix function and the contrasts which
> can be made after applying it. I used this function as a part of edgeR
> package.
> Here are 2 designs:
> > design_1=model.matrix(~0+ Control+ Gender+ Location, data=data_2)
> > colnames(design_1)
> [1] "Control0"  "Control1"  "Gender1"   "Location1"
> How could I get the contrast Gender1-Gender0, shouldn't it be included 
> in the columns since there is no intercept?

It is included.  It is called "Gender1".  By default, model.matrix() 
produces contrasts relative to the first level of each factor.

> If I want to see the contrast (Gender1-Gender0), I could change the 
> order
> of the factors in the formula:
> > design_2=model.matrix(~0+ Gender+ Control+ Location, data=data_2)
> > colnames(design_2)
> [1] "Gender0"   "Gender1"   "Control1"  "Location1"
> But then there is a question: is there any mathematical difference 
> between 2 designs?

Yes, there is.  Now "Control1" represents Control1-Control0 but "Gender1" 
is just Gender1.

I would suggest that you only use "0+" for oneway layouts, not for 
additive models with multiple factors.

> If someone knows a link/book where the function model.matrix is well and 
> in details explained, please let me know.

The main document perhaps is Section 11.1 of the Introduction to R manual 
that comes with R.  But I doubt you will find that fully helpful.  You can 
also try asking questions on the R-help mailing list.

But really, there are two main things you need to understand to follow 
design matrices reasonably well.

First, each factor that you add to the linear model adds one fewer column 
than the factor has levels.  You start with an intercept.  Adding Control 
adds one further column (because Control has two levels).  Adding Gender 
adds two columns (because Gender has three levels).  Adding Location adds 
1 column (because Location has two levels).  That's four columns in total.

No matter how you parametrize you must have exactly 4 columns.  You can 
try fiddling the model by using "0+", in that case the first level of the 
first factor enters in place of the intercept.  But you can't expect the 
first level of any other factor to appear because that would make more 
than 4 columns.

Second, model.matrix() compares each level back to the first level of each 
factor.  So simply using ~Control+Gender+Location will gives you 
coefficients representing Control1-Control0, Gender1-Gender0, 
Gender2-Gender0 and Location1-Location0.  That's not too difficult!

Best wishes

> Thank you very much in advance!
> Mike

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list