[BioC] Limma; matrix of microarray data and design matrix

James W. MacDonald jmacdon at med.umich.edu
Mon Jun 6 15:33:23 CEST 2011

```Hi John,

On 6/6/2011 6:47 AM, john herbert wrote:
> Thank you James, that is very helpful.
> In terms of why, I am not sure at the moment.
>
> To be honest, I don't have any idea about the stats here.
>
> Take the tilde for instance. searching online finds.
> 1. In asymptotic notation<http://mathworld.wolfram.com/AsymptoticNotation.html>
> , [image: fâˆ¼phi] is used to mean that [image: f/phi->1].
> 2. In statistics<http://en.wikipedia.org/wiki/Statistics>  and probability
> theory<http://en.wikipedia.org/wiki/Probability_theory>, â€¹~â€º means â€œis
> distributed asâ€
>
> How does that fit in with ~f?
> 0 compared factor variables?

No. The tilde has a different meaning within R, specifying the right
hand side of a model equation. The default in R is to fit an intercept
in all linear models (which in the context of ANOVA is better thought of
as a 'baseline' sample, to which all other samples are compared).

So when you do something like

f <- factor(rep(c("A","B"), each = 3))
design <- model.matrix(~f)

you are by default setting the 'A' samples as the baseline sample, and
the second coefficient in the model is the B - A comparison.

To eliminate the intercept, you add either a 0 or a -1 to the right hand
side of the equation:

design <- model.matrix(~0+f)

which will then compute the average expression of the A and B samples
separately, so you have to explicitly create a contrasts matrix in order
to compute the B - A contrast.

You might also consider looking at Julian Faraway's excellent book on
using R to fit linear models. This used to be a pdf he gave away for
free, but is now published. However, some work with the googles might
get you to the pdf if it is floating around on somebody's website.

Best,

Jim

>
> On Fri, Jun 3, 2011 at 2:20 PM, john herbert<arraystruggles at gmail.com>wrote:
>
>> Dear Bioconductors.
>> I have a six column matrix of one colour array data (first 3 columns are
>> case, second 3 are control), quantile normalized.
>>
>> I would like to do simple differential gene expression using limma.
>>
>> Is there a line or two of code that generates a simple design matrix for
>> this scenario?
>>
>> I usually use a design matrix created from a targets file, and I never
>> really understand lines like...  design<- model.matrix(~0+f) (what is
>> ~0+f)?
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
-------------- next part --------------
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
```