[R] GLM model vs. GAM model

Thu Oct 28 13:40:56 CEST 2004

For GAMs fitted using mgcv::gam, I think that there's an argument that the 
most self consistent comparison would be based on the GCV/UBRE score for 
the models, since this has usually been used to select the degree of 
smoothness of the GAM model (you can fit the GLM using gam(), so this is 
available for both models). 

As you (and the help files) point out, the p-values produced by anova.gam 
are only approximate, when the models are fitted using penalties on the 
likelihood. There are two issues here, I think. 
 i) The calculations are conditional on the smoothing parameters, which 
   have usually been estimated.
 ii) Given the smoothing parameters, the GAMs are estimated by penalized 
   likelihood maximization, not MLE. 

(ii)is an issue when comparing two or more models using anova.gam, since 
the calculations use the MLE theory based distributional results, but with 
the model degrees of freedom set to the effective degrees of freedom of 
the model. This is not as outrageous as it sounds: the GAM can always be 
re-parameterized using the eigen-vectors of its penalty matrix. This 
parameterization is orthogonal, and has a diagonal penalty matrix with a 
elements that increase rather smoothly and steeply. This tends to mean 
that, for any given smoothing parameter, some model parameters are 
effectively zeroed while some are effectively un-penalized. If all parameters 
were in one of these two categories then using distributional theory for 
un-penalized GLMs, with model degrees of freedom set to the EDF of the GAM 
would be perfectly legitimate. Of course in reality there are almost 
always a number of parameters subject to intermediate amounts of 
penalization. The approach used in anova.gam is effectively saying 
that these are equivalent to a smaller number of unpenalized parameters, 
which can only be an approximation!

Note that single argument calls to anova.gam do not use the MLE based 
results. 

Interestingly, taking a mixed model approach (e.g. mgcv::gamm, for simple 
version) doesn't make things much better: in that case you *can* get 
unpenalized MLE's, but testing for presence or absence of smooth terms 
involves LRT's at the boundary of the parameter space, while other tests 
are often conditional on the parameters of the random effect 
distributions... 

> I have a question about how to compare a GLM with a GAM model using anova
> function.
> 
> A GLM is performed for example: 
> 
> model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial",
> na.action = na.exclude)
> 
> A second nested model could be:
> 
> model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action =
> na.exclude)
> 
>  
> 
> To compare these two GLM models the instruction is: 
> 
> anova(model1,model2, test="F")
> 
>  
> 
> Similarly for GAM models
> 
> model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial",
> na.action = na.exclude)
> 
>  
> 
> "R" allows to compare these two models (GLM vs. GAM)
> 
> anova(model2,model3, test="F") 
> 
> This instruction returns a p-value with no error or warning, but this test
> is based on maximum likelihood, and GAM models are not fitted with maximum
> likelihood criteria, thus I think this p-value is not correct.
> 
> Please, I really appreciate any information about how to compare a GLM with
> a GAM model. 

best,
Simon
_____________________________________________________________________
> Simon Wood simon at stats.gla.ac.uk        www.stats.gla.ac.uk/~simon/
>>  Department of Statistics, University of Glasgow, Glasgow, G12 8QQ
>>>   Direct telephone: (0)141 330 4530          Fax: (0)141 330 4814