[BioC] edgeR: generating a correct design matrix - multifactorial design

Fri Jul 27 12:58:31 CEST 2012

Dear James,

Thank you for the prompt response. 
When it comes to interactions, I do tend to get stumped easily.

I had some additional queries:

1) I realized I made a typo in the array design model which would be: design = model.matrix(~0+dat$Group): to give the contrast of (Mut+ - Mut) - (WT+ - WT).

As you mentioned, I did notice the glmLRT function has a contrast parameter, but was/am not sure if the above design model (i.e. ~0+dat$Group) would work with the aforementioned contrast? Since that was limma based for arrays and this is RNA-Seq.

2) Thank you for the detailed alternative parameterization method, I realize the error in my dat object.

On another note: 

3) I do not completely follow the cpm (counts per million) concept for filtering the data. (Probably very naïve but) How does one decide on the cpm? How does one equate it to (read) counts - that's what the count table is right - number of reads? Why not just use those (i.e. say something like >= 10 reads in at least 2 samples (in my case here, as I gather from reading the user guide its about 50% of samples)?

4) Since tag-wise dispersion is recommended and one must calculate either common or trended dispersion prior to that. 
How does one decide to use common dispersion or trended dispersion or both before calculating tag-wise dispersion? (Or am I missing something obvious here).

Many Thanks in advance.

Best Wishes,
Natasha

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu] 
Sent: 26 July 2012 16:03
To: Natasha Sahgal; 'bioconductor at r-project.org'
Subject: Re: [BioC] edgeR: generating a correct design matrix - multifactorial design

Hi Natasha,

On 7/26/12 8:29 AM, Natasha Sahgal wrote:
> Dear Prof. Gordon and List,
>
> I have an RNA-Seq expt for which I'd like to use edgeR, as it is multifactorial in design.
> Having gone through the user guide, I am a bit confused as to how to generate the model for my expt.
>
> The expt: 2 cell-lines (mut,wt), 2 conditions(stimulated, unstimulated), n=2 in each group.
> My aim: to detect DE genes based on the effect of stimulus on mut cells.
>
> Thus,
> dat
>    Sample Group Stim
> 1      1    WT   No
> 2      2    WT   No
> 3      3   WT+  Yes
> 4      4   WT+  Yes
> 5      5   Mut   No
> 6      6   Mut   No
> 7      7  Mut+  Yes
> 8      8  Mut+  Yes
>
> Now, if this were array data the model would be:
> design = model.matrix(~dat$Group)
> and whilst fitting the model I could make a contrast such as (Mut+ - 
> Mut) - (WT+ - WT)
>
> I am not sure how to do this for the RNA-Seq data (i.e. what should 
> the model be? And what coefficients should I pull out?)

You use the same exact design matrix you would have used for array data. 
Note that glmLRT() has a contrast argument, so if you parameterize the way you are doing here, you will need to use that (I am assuming you are just doing something like model.matrix(~Group), as including Stim doesn't make sense).

Alternatively you could parameterize by separating out the group from the stim and fit the interaction as part of the design matrix:

 > Group <- factor(rep(c("WT", "Mut"), each=4))  > Group <- relevel(Group, "WT")  > Stim <- factor(rep(c("No","Yes"), each=2, times=2))  > model.matrix(~Group*Stim)
   (Intercept) GroupMut StimYes GroupMut:StimYes
1           1        0       0                0
2           1        0       0                0
3           1        0       1                0
4           1        0       1                0
5           1        1       0                0
6           1        1       0                0
7           1        1       1                1
8           1        1       1                1

And glmLRT() will test the interaction term by default.

Best,

Jim
> Whether the model should be:
> 1) model.matrix(~dat$Group) and somehow in the glmLRT function specify the above contrast in some manner?
>
> 2) model.matrix(~dat$Group+dat$Group*dat$Stim) (coefficient/contrast?)
>
> 3) model.matrix(~dat$Group*dat$Stim) (coefficient/contrast?)
>
>
> I'd appreciate any help and advice.
>
>
> Many Thanks,
> Natasha
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor