[BioC] Limma, Multiple Factors in Design <- model.matrix, help in interpretation

James W. MacDonald jmacdon at med.umich.edu
Fri Jan 25 16:22:17 CET 2008


Hi Haja,

haja kadarmideen wrote:
> Hi, I am new to this list, so apologies if this has been discussed in
> the past. Unfortunately I could not find an answer to my question in
> the limma user guide.
> 
> I have 3 different factors in my gene expression data, 1. parasite 2.
> tissue and 3. time, with 2, 3 and 4 levels respectively. I want to
> get contrasts between these levels within each factor using limma.
> 
> I have a problem with the following code: library("limma")design <-
> model.matrix(~0+factor(eset$parasite) + factor(eset$tissue) +
> factor(eset$time))This first model produces a design matrix with one
> column missing for tissue (2 levels only) and for time (3 levels
> only) but not for parasite (ie., both 2 levels are present). A total
> of 7 columns in the design matrix.
> 
> Column names in the design matrix are: factor(eset$parasite)HC
> factor(eset$parasite)TC  factor(eset$tissue)B  factor(eset$tissue)G
> factor(eset$time)03  factor(eset$time)07  factor(eset$time)21
> 
> The second model without '0' produces a column for intercept and one
> level less for all factors (total of 7 columns in the design matrix).
> 
> 
> Column names in the design matrix are: (Intercept)
> factor(eset$parasite)TC  factor(eset$tissue)B  factor(eset$tissue)G
> factor(eset$time)03  factor(eset$time)07  factor(eset$time)21
> 
> Then I fit the linear model to obtain results and later on to get
> contrast of interest. fit <- lmFit(eset, design)ebayes <- eBayes(fit)
> 
> 
> My questions are as follows: 1. In the first model, I get the
> coefficients for each level of 'parasite' but not for tissue and for
> time. I like to know why '~0+ factor(....)' model gives mean for all
> levels in the first factor but not for other factors? What should I
> do to get the coefficients for all levels for other factors (tissue
> and time) in model.matrix syntax. 

You can't get coefficients for all levels of the other factors. The 
model matrix for such a comparison would be rank deficient and the 
coefficients would then not be uniquely specified. In your case, any 
coefficient for tissue or time will be a comparison to the 'missing' 
level (e.g., the tissue2 coefficient is actually tissue2-tissue1).

2. In the second model, what is the
> meaning of an intercept? Is this actually an estimate for the level
> which has been set to zero (due to linear dependancy). So that the
> actual estimates I get for each level in parasite, tissue and time
> are indeed the differences between the levels and the one which is
> missing (set to zero)? does it only apply to parasite or for all
> factors? 

The intercept here is the baseline level of parasite, tissue and time. 
In this case, all coefficients are the difference between the factor 
level and the baseline. In other words, all the coefficients are already 
contrasts.

3. What is a meaning of contrasts built using actual
> results; for instance B versus G in tissue or time03 versus time 21
> under both models?

You have to figure this out algebraically. You should read the limma 
user's guide carefully as there are any number of examples of design 
matrices and what the coefficients mean, as well as how to figure out 
how to compute a particular contrast.

If that still doesn't help you should either find a introductory 
textbook on linear models to increase your knowledge or seek out a local 
statistician who would likely be able to help.

Best,

Jim


> 
> Any help in your interpretation of this results would be great.
> Actually, I prefer a model where I get results for each level. I
> think such a model would be better to fit intercations between
> different factors in this model (which is what I evetually want to do
> ) and subsequent interpretation of results?
> 
> I would really appreciate your help.Many thanksHajaHaja Kadarmideen
> DVM MVSc PhDCSIRO Livestock IndustriesRockhampton, QLD 4701 Australia
>  _________________________________________________________________
> 
> 
> e%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641&_t=762955845&_r=tig_OCT07&_m=EXT
>  [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list