[BioC] edgeR choose the best model of GLM

Gordon K Smyth smyth at wehi.EDU.AU
Fri Feb 22 00:38:55 CET 2013


Dear Eszter,

The problem that you've stated (which of these models best describes my 
data) applies to univariate data, for fitting models to a single vector of 
counts.  For RNA-seq experiments, you are fitting glms to many data 
vectors.  You will get different answers for different genes, so you have 
to think a bit differently.

With edgeR, you would start by fitting the full model

   design <- model.matrix(~E+A+E:A)

then, after some intermediate commands, use

   fit <- glmFit(y, design)
   lrt <- glmLRT(fit, coef=4)
   topTags(lrt)

to test for which genes the interaction is significant.  If the test is 
significant, then the E*A model is necessary and best.  If not, one of the 
simpler models is satisfactory.  You could subset to genes for which the 
interaction is not significant, then try

   design <- model.matrix(~E+A)

and proceed from there.

I interpolate some other answers below.

> Date: Wed, 20 Feb 2013 15:15:20 +0100
> From: Ari Eszter <arieszter at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR choose the best model of GLM
>
> Dear edgeR Users,
>
> I have RNS-Seq count data by genes, 4 treatment groups (each with 2 
> biological replicates) and two factors (E: C or H, and A: 13 or 25). And 
> I applied the GLM model of edgeR with these designs: E, A, E+A and E*A

> I would like to decide which model and which variant describes my data 
> better.
>
> 1. Am I right that the $coefficients of a "DGELRT" object are log 
> likelihood values?

No.  They are coefficients from the linear model fitted, as explained in 
the documentation: help("glmFit").

> 2. Does anyone have a suggestion how can I apply AIC (Akaike Information 
> Criterion) test or Likelihood ratio (LR) test on edgeR GLM results to 
> choose the best model?

Answered above.  We have experimented with AIC, but have not found it 
particularly successful.

> 3. Is there a possibility to plot the multiple GLM regression for a 
> gene? (I would like plot something like this: 
> http://www.jerrydallal.com/LHSP/pix/regpix4.gif )

I have no idea what this plot represents.  In any case, I guess this is 
for one data vector.  For RNA-seq data, you would have to view 20,000 of 
these plots, not a very productive way to go.

Best wishes
Gordon

> Bests,
> Eszter
>
> -- 
> Eszter Ari
> Institut für Populationsgenetik
> Vetmeduni Vienna
> Veterinärplatz 1
> 1210 Wien, Austria
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}


More information about the Bioconductor mailing list