[BioC] visualise model fit in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Wed Nov 2 09:47:02 CET 2011

Dear Iain,

I think you might be after an interaction plot, see


I've never seen anyone do this for count data.  However I guess you could 
make such a plot approximately in edgeR by computing

   lcpm <- log2(cpm(y))

Then using

   interaction.plot(x.factor, trace.factor, response=lcpm[i,])

where 'i' is the gene of interest.

Best wishes

> Date: Mon, 31 Oct 2011 10:26:13 +0000 (GMT)
> From: Iain Gallagher <iaingallagher at btopenworld.com>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: Re: [BioC] visualise model fit in edgeR
> Dear Gordon
> Thanks for your reply. There's nothing like someone else's question to
> make one focus on what exactly one wants. This was certainly the case
> here!
> I have given this some thought from my statisically naive
> point of view and I have attached a mock-up picture of the kind of thing
> I envisaged (although I appreciate the real life situation is more
> complicated).
> The experimental design is as follows:
> Cells
> were collected from 6 animals and infected with one of 4 strains of
> bacteria or left uninfected. RNA was sampled at 2, 6, 24 & 48 hours
> post infection. There are thus 120 data points across the whole
> experiment.
> I have used edgeR to analyse the infected v
> control data at each timepoint using the GLM approach? - effectively a
> paired samples analysis for each timepoint? as per the edgeR manual
> (section 11). Perhaps there's something more sophisticated I could do
> here though. If you had any advice that would be great!
> design <- model.matrix(~ cow + infection)
> #dispersion estimate
> d <- estimateGLMCommonDisp(d, design)
> #fit the NB GLM for each gene
> fitFiltered <- glmFit(d, design, dispersion = d$common.dispersion)
> #carry out the likliehood ratio test
> lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)
> For
> my audience I simply wanted to illustrate the fitting of the two models
> and how likelihood ratio tests are used rather than a t-test approach.
> In the attached pdf each black line represents the H1 model (with
> infection) and each red line represents the null model (cows only) for
> one gene only. The points are the 'raw data' (but not real data); C =
> control, I = infected. I realise this illustration is showing
> essentially a linear fit but I'm trying to aim for simplicity for the
> audience (a conceptual rather than entirely accurate approach). I would
> be happy to get my hands dirty coding something more lifelike as I think
> that would aid my understanding as well.
> I was going to
> describe this in terms of the 'fit' of each line to the data i.e. for
> the regulated gene the black line is the more 'likely' model whereas in
> the non-regulated gene there is little to separate the models.
> Hope this is somewhat useful.
> Best
> Iain

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list