[BioC] [Bioc-sig-seq] interaction factor in edgeR

Wed May 11 04:21:59 CEST 2011

Dear Prof Smyth,

in 

design <- model.matrix(~ a + b + a:b , data=targets)

my interest is in factor a  (coef=2). 

""Do you expect the effect of experimental factor b to be same for each level of a?  If yes, then maybe you don't need the interaction term.  It depends on your experiment and on the questions you want to ask.""""

I am not sure, but I guess the answer is no. The experiment consists of embryos collected at two time points (factor a), normal or cloned embryos (factor b). And on top of it, it is an unbalanced sample. I have previously tested the hypothesis of whether cloning affects the gene expression, for which I do not need the first factor (a). I am using the factor b as a block to test the hypothesis of whether the expression is different between time points (factor a). 

Please, let me know if you think otherwise.

thanks for the reply,

Fernando

________________________________________
From: Gordon K Smyth [smyth at wehi.EDU.AU]
Sent: Tuesday, May 10, 2011 6:53 PM
To: Biase, Fernando
Cc: bioc-sig-sequencing at r-project.org
Subject: [Bioc-sig-seq] interaction factor in edgeR

Dear Fernando,

> Date: Tue, 10 May 2011 13:40:23 -0500
> From: "Biase, Fernando" <biase at illinois.edu>
> To: "bioc-sig-sequencing at r-project.org"
>       <bioc-sig-sequencing at r-project.org>
> Subject: [Bioc-sig-seq] interaction factor in edgeR
>
> Dear list users,
>
> I am not a statistician, so pardon my ignorance.
>
> When using edgeR package to analyse RNA-seq data the number of
> differential expressed genes vary depending on whether I use an
> interaction factor in the design. Can anyone suggest why does it happen?

Well, you fit a different model, and test a different hypothesis, so the
results change.  No doubt the residual dispersion has changed as well.
Wouldn't you be worried if the results didn't change?

> Example:
>
> if I use:
> design <- model.matrix(~ a + b  , data=targets)
>
> I have:
> summary(decideTests_eset_b_tmm)
>   [,1]
> -1  2855
> 0  12346
> 1   4928
>
> if I use:
> design <- model.matrix(~ a + b + a:b , data=targets)
>
> then:
> summary(decideTests_eset_b_tmm)
>   [,1]
> -1 3343
> 0  9490
> 1  4191

You haven't actually told us which coefficient you're testing for.

> When having more than one factor, is it more appropriate to have the
> interaction factor in the design?

Do you expect the effect of experimental factor b to be same for each
level of a?  If yes, then maybe you don't need the interaction term.  It
depends on your experiment and on the questions you want to ask.

> Thanks a lot
> Best,
>
> Fernando

BTW, I would much prefer it if you would post questions about edgeR to the
main Bioconductor mailing list rather than to bioc-sig-sequencing.  The
questions relate more to the general problem of analysing gene expression
experiments rather than to details of particular sequencing technologies.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}