[BioC] edgeR, multifactorial design

Gordon K Smyth smyth at wehi.EDU.AU
Sun Feb 9 05:48:47 CET 2014


Dear Mike,

You are asking basic questions about interaction formula in R.  Many 
non-statisticians find model formulas in R a bit confusing.  It would be 
simpler and just as effective to take the alternative approach described 
in the section "Defining each treatment combination as a group" of the 
edgeR User's Guide.

For your real experiment, you might combine disease state and localization 
into one factor (dis.loc) and use

   model.matrix(~sex+dis.loc)

Best wishes
Gordon

> Date: Fri, 7 Feb 2014 14:34:04 +0100
> From: Mike Miller <mike.bioc32 at gmail.com>
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] edgeR, multifactorial design
>
> Dear EdgeR community,
>
>
> I am new to edgeR and still in the phase of reading the vignette in details
> to be able to use it for my data.
> I have a question in understanding the model.matrix.
> On page 27 (paragraph 3.3.2 "Nested interaction formulas"), the design is
> defined as:
>> targets
> Sample Treat Time
> 1 Sample1 Placebo 0h
> 2 Sample2 Placebo 0h
> 3 Sample3 Placebo 1h
> 4 Sample4 Placebo 1h
> 5 Sample5 Placebo 2h
> 6 Sample6 Placebo 2h
> 7 Sample1 Drug 0h
> 8 Sample2 Drug 0h
> 9 Sample3 Drug 1h
> 10 Sample4 Drug 1h
> 11 Sample5 Drug 2h
> 12 Sample6 Drug 2h
>
> targets$Treat <- relevel(targets$Treat, ref="Placebo")
>
> design <- model.matrix(~Treat + Treat:Time, data=targets)
>
>
> #and the coefficient names are:
>> colnames(design)
> [1] "(Intercept)" "TreatDrug"
> [3] "TreatPlacebo:Time1h" "TreatDrug:Time1h"
> [5] "TreatPlacebo:Time2h" "TreatDrug:Time2h"
>
> Whereas on page 28 (paragraph 3.3.4 "Interaction at any time") the design
> formula looks like this:
> #I added "2" in "design2" compared to original text for easier following:
>> design2 <- model.matrix(~Treat + Time + Treat:Time, data=targets)
>> colnames(design2)
> [1] "(Intercept)" "TreatDrug"
> [3] "Time1h" "Time2h"
> [5] "TreatDrug:Time1h" "TreatDrug:Time2h"
>
> It is explained that for the design2 (page 29 top):
> "The last two coefficients give the DrugvsPlacebo.1h and DrugvsPlacebo.2h
> contrasts, so that
>> lrt <- glmLRT(fit, coef=5:6)
> is useful because it detects genes that respond differently to the drug,
> relative to the placebo,
> at either of the times."
> My question is, if I understood it well, in design2, why there are no
> coefficients "TreatPlacebo:Time1h" and "TreatPlacebo:Time2h"? And should't
> "Time1h" and "Time2h" be effects of time, no matter of the Treat(ment), and
> not:
> "> lrt <- glmLRT(fit, coef=3)
> and
>> lrt <- glmLRT(fit, coef=4)
> are the e ffects of the reference drug, i.e., the effects of the placebo at
> 1 hour and 2 hours" as it is written in the vignette text?
>
> Thank you!
> ------------------------------------
> Why I need edgeR: I have an RNASeq experiment (~30 samples), where I need
> to explore the influence of 3 factors with 2 levels each:
> 1. sex: f/m
> 2. disease_state:healthy/cancer
> 3. localization: blood/bones.
> Question I want to answer: which genes are differentially expressed between
> 2 localisations in 2 disease states (i.e. are bones more severely affected
> by cancer than blood) taking into account different sex?
> I assume that my design formula should look like:
> design=~sex+disease+localization+disease:localization
>
> Could anyone please tell me if the formula is correct? And, what should be
> the output? How could I know if the disease has different effects depending
> on the localization? By number of genes affected (=differentially
> expressed)?
>
> I would appreciate very much if someone has some time to help me with any
> of the questions.
> Best,
> Mike

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list