[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Thu Jan 26 08:39:26 CET 2012

I think we have gone through this before.
First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding.
Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC.
Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures.
Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability.

Rubén

-----Mensaje original-----
De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] En nombre de Ben Bolker
Enviado el: miércoles, 25 de enero de 2012 15:41
Para: r-help at stat.math.ethz.ch
Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Rubén Roa <rroa <at> azti.es> writes:

> A more 'manual' way to do it is this.

> If you have all your models named properly and well organized, say 
> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
> a list with one component named 'aic') then something like
> this:

> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

> will calculate all the 1081 AIC differences between paired comparisons 
> of the 47 models. It may not be pretty but works for me.

> An AIC difference equal or less than 2 is a tie, anything higher is 
> evidence for the model with the less AIC (Sakamoto et al., Akaike 
> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for automating this sort of thing.  There is an also an AICtab function in the emdbook package.

(3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work.  It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble.  If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator.

(4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in "the simpler model is adequate if dAIC<2".  If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures!  A smaller AIC means lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g.  <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).

> 
> -----Original Message-----
> From: r-help-bounces <at> r-project.org on behalf of Milan 
> Bouchet-Valat
> Sent: Wed 1/25/2012 10:32 AM
> To: Jhope
> Cc: r-help <at> r-project.org

> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
> interactions and unique combinations?

> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> > Hi R-listers,
> > 
> > I have developed 47 GLM models with different combinations of 
> > interactions from 1 variable to 5 variables. I have manually made 
> > each model separately and put them into individual tables (organized 
> > by the number of variables) showing the AIC score. I want to compare all of these models.
> > 
> > 1) What is the best way to compare various models with unique 
> > combinations and different number of variables?
> See ?step or ?stepAIC (from package MASS) if you want an automated way 
> of doing this.
> 
> > 2) I am trying to develop the most simplest model ideally. Even 
> > though adding another variable would lower the AIC,

  No, not necessarily.

> how do I interpret it is worth
> > it to include another variable in the model? How do I know when to stop? 

  When the AIC stops decreasing.

> This is a general statistical question, not specific to R. As a 
> general rule, if adding a variable lowers the AIC by a significant 
> margin, then it's worth including it.

  If the variable lowers the AIC *at all* then your best estimate is that you should include it.

> You should only stop when a variable increases the AIC. But this is 
> assuming you consider it a good indicator and you know what you're 
> doing. There's plenty of literature on this subject.
> 
> > Definitions of Variables:
> > HTL - distance to high tide line (continuous) Veg - distance to 
> > vegetation Aeventexhumed - Event of exhumation Sector - number 
> > measurements along the beach Rayos - major sections of beach 
> > (grouped sectors) TotalEggs - nest egg density
> > 
> > Example of how all models were created: 
> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, 
> > data=data.to.analyze, family=binomial) Model7.glm <- 
> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, 
> > data.to.analyze) Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) 
> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial) Model37.glm 
> > <- glm(cbind(Shells, TotalEggs-Shells) ~ 
> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
> To extract the AICs of all these models, it's easier to put them in a 
> list and get their AICs like this:
> m <- list()
> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, 
> data=data.to.analyze, family=binomial)
> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = 
> binomial, data.to.analyze)
> 
> sapply(m, extractAIC)
> 
> Cheers
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.