[BioC] Limma; matrix of microarray data and design matrix

James W. MacDonald jmacdon at med.umich.edu
Mon Jun 6 15:33:23 CEST 2011

Hi John,

On 6/6/2011 6:47 AM, john herbert wrote:
> Thank you James, that is very helpful.
> In terms of why, I am not sure at the moment.
> To be honest, I don't have any idea about the stats here.
> Take the tilde for instance. searching online finds.
> 1. In asymptotic notation<http://mathworld.wolfram.com/AsymptoticNotation.html>
> , [image: f∼phi] is used to mean that [image: f/phi->1].
> 2. In statistics<http://en.wikipedia.org/wiki/Statistics>  and probability
> theory<http://en.wikipedia.org/wiki/Probability_theory>, ‹~› means “is
> distributed as”
> How does that fit in with ~f?
> 0 compared factor variables?

No. The tilde has a different meaning within R, specifying the right 
hand side of a model equation. The default in R is to fit an intercept 
in all linear models (which in the context of ANOVA is better thought of 
as a 'baseline' sample, to which all other samples are compared).

So when you do something like

f <- factor(rep(c("A","B"), each = 3))
design <- model.matrix(~f)

you are by default setting the 'A' samples as the baseline sample, and 
the second coefficient in the model is the B - A comparison.

To eliminate the intercept, you add either a 0 or a -1 to the right hand 
side of the equation:

design <- model.matrix(~0+f)

which will then compute the average expression of the A and B samples 
separately, so you have to explicitly create a contrasts matrix in order 
to compute the B - A contrast.

See the limmaUsersGuide, and ?formula for more information.

You might also consider looking at Julian Faraway's excellent book on 
using R to fit linear models. This used to be a pdf he gave away for 
free, but is now published. However, some work with the googles might 
get you to the pdf if it is floating around on somebody's website.



> On Fri, Jun 3, 2011 at 2:20 PM, john herbert<arraystruggles at gmail.com>wrote:
>> Dear Bioconductors.
>> I have a six column matrix of one colour array data (first 3 columns are
>> case, second 3 are control), quantile normalized.
>> I would like to do simple differential gene expression using limma.
>> Is there a line or two of code that generates a simple design matrix for
>> this scenario?
>> I usually use a design matrix created from a targets file, and I never
>> really understand lines like...  design<- model.matrix(~0+f) (what is
>> ~0+f)?
James W. MacDonald, M.S.
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
