[BioC] Design matrix for simple time course

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Mon Mar 6 11:09:41 CET 2006


Hi Jim

Thanks for the information - very clear and succinct :)  I understand
the difference between the models, just not how the differently
structured design matrices related to them.

Thanks
Mick

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at med.umich.edu] 
Sent: 03 March 2006 18:18
To: michael watson (IAH-C)
Cc: Bioconductor
Subject: Re: [BioC] Design matrix for simple time course

Hi Mick,

michael watson (IAH-C) wrote:
> Hi
> 
> I am trying to create a design matrix for a simple, one-channel 
> time-course experiment.
> 
> I have five time points with three replicated arrays at each time
point.
> I want to set up the design matrix.
> 
> I tried using:
> 
> 	model.matrix(~factor(rep(1:5,each=3)))
> 
> Vaguely following the tutorial here
> (http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html)
> 
> However, I only have one factor to model, time.
> 
> The matrix that comes out as the first column all of ones, the 
> intercept.  What I (think) I want is the first column to have three 
> 1's and the rest 0's.
> 
> I guess I'm really struggling as I don't know what the difference is 
> between the output of model.matrix, with an Intercept column of all 
> 1's, and the design matrix I want, which has a first column of three 
> 1's at the top and the rest 0's.

This is a problem. If you are trying to analyze your data using a
sophisticated tool like limma but you don't understand the models you
are fitting, I would venture to say that you are putting the cart before
the horse. I would strongly recommend either finding a local
statistician who is willing to sit down with you and explain the
difference between a cell means and factor effects ANOVA model, or at
the very least perusing a textbook that covers these topics.

I would recommend something like 'Applied linear statistical models' by
Neter, Kutner, Nachtsheim and Wasserman, which gives many clear examples
and is highly approachable.

As a start, here is the basic difference between the two models. In a
factor effects model (the one with an intercept, given by all 1's in the
first column), the intercept term represents one time point (in this
case, the 1st timepoint), and all of the other four terms represent the
*difference* between the given timepoint and the first (e.g., time2 -
time1, time3 - time1, etc). In this scenario you might not need a
contrast matrix if these are the comparisons you are interested in. If
you want other comparisons then you have to do the algebra to figure out
the correct contrast matrix.

In a cell means model, you are estimating the mean expression at each
timepoint, so you have to set up explicit contrasts to do whatever
comparisons you are interested in. As Ben Bolstad already noted, you fit
this model by adding a -1 (or a 0) to your call to model.matrix().

HTH,

Jim


> 
> :-s
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor


--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list