[BioC] Limma; matrix of microarray data and design matrix

Mon Jun 6 15:33:23 CEST 2011

Hi John,

On 6/6/2011 6:47 AM, john herbert wrote:
> Thank you James, that is very helpful.
> In terms of why, I am not sure at the moment.
>
> To be honest, I don't have any idea about the stats here.
>
> Take the tilde for instance. searching online finds.
> 1. In asymptotic notation<http://mathworld.wolfram.com/AsymptoticNotation.html>
> , [image: fâˆ¼phi] is used to mean that [image: f/phi->1].
> 2. In statistics<http://en.wikipedia.org/wiki/Statistics>  and probability
> theory<http://en.wikipedia.org/wiki/Probability_theory>, â€¹~â€º means â€œis
> distributed asâ€
>
> How does that fit in with ~f?
> 0 compared factor variables?

No. The tilde has a different meaning within R, specifying the right 
hand side of a model equation. The default in R is to fit an intercept 
in all linear models (which in the context of ANOVA is better thought of 
as a 'baseline' sample, to which all other samples are compared).

So when you do something like

f <- factor(rep(c("A","B"), each = 3))
design <- model.matrix(~f)

you are by default setting the 'A' samples as the baseline sample, and 
the second coefficient in the model is the B - A comparison.

To eliminate the intercept, you add either a 0 or a -1 to the right hand 
side of the equation:

design <- model.matrix(~0+f)

which will then compute the average expression of the A and B samples 
separately, so you have to explicitly create a contrasts matrix in order 
to compute the B - A contrast.

See the limmaUsersGuide, and ?formula for more information.

You might also consider looking at Julian Faraway's excellent book on 
using R to fit linear models. This used to be a pdf he gave away for 
free, but is now published. However, some work with the googles might 
get you to the pdf if it is floating around on somebody's website.

Best,

Jim

>
> On Fri, Jun 3, 2011 at 2:20 PM, john herbert<arraystruggles at gmail.com>wrote:
>
>> Dear Bioconductors.
>> I have a six column matrix of one colour array data (first 3 columns are
>> case, second 3 are control), quantile normalized.
>>
>> I would like to do simple differential gene expression using limma.
>>
>> Is there a line or two of code that generates a simple design matrix for
>> this scenario?
>>
>> I usually use a design matrix created from a targets file, and I never
>> really understand lines like...  design<- model.matrix(~0+f) (what is
>> ~0+f)?
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
-------------- next part --------------
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues