[R] mgcv, testing gamm vs lme, which degrees of freedom?

Fri Jun 18 15:59:18 CEST 2010

I see this question is still open, so let me try to give you some
answer. As far as I understood, the smoothing splines are not used as
pure random effects, but as a combination of random and fixed effects.
Very simplified, the first two basis functions (intercept and linear
effect) are added as a fixed effect, whereas all other basis functions
go to the random component. Hence, the df for the log likelihood will
be indeed smaller than in the case of gam, where all splines are
considered fixed effects.

I've struggled with this issue myself as well, and I still don't
understand it fully. I believe I have the concepts down by now, but
the mathematical background requires more study for me I'm afraid. Did
you read the book of Simon Wood already?

http://www.amazon.ca/Generalized-Additive-Models-Introduction-R/dp/1584884746

Simon Wood advises to compare models using the anova's for the
lme-part of the object, i.e.
anova(model1$lme, model2$lme).
where one model contains the smoothing term of interest and the other
doesn't. I base most of my model testing in the gamm context on these
tests. To be completely correct, this -apparently- only counts for
gamms using the identity link.

Cheers
Joris

On Wed, Jun 16, 2010 at 9:33 PM, Carlo Fezzi <c.fezzi at uea.ac.uk> wrote:
> Dear all,
>
> I am using the "mgcv" package by Simon Wood to estimate an additive mixed
> model in which I assume normal distribution for the residuals. I would
> like to test this model vs a standard parametric mixed model, such as the
> ones which are possible to estimate with "lme".
>
> Since the smoothing splines can be written as random effects, is it
> correct to use an (approximate) likelihood ratio test for this? If so,
> which is the correct number of degrees of freedom? Sometime the function
> LogLik() seems to provide strange results regarding the number of degrees
> of freedom (df) for the gam, for instance in the example I copied below
> the df for the "gamm" are equal to the ones for the "lme", but the
> summary(model.gam) seems to indicate a much higher edf for the gamm.
>
> I would be very grateful to anybody who could point out a solution,
>
> Best wishes,
>
> Carlo
>
> Example below:
>
> ----
>
> rm(list = ls())
> library(mgcv)
> library(nlme)
>
> set.seed(123)
>
> x  <- runif(100,1,10)                           # regressor
> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)        # random intercept
> id <- rep(1:10, each=10)                        # identifier
>
> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>
> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  # lme model
>
> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm
>
> ## same number of "df" according to logLik:
> logLik(f1)
> logLik(f2$lme)
>
> ## much higher edf according to summary:
> summary(f2$gam)
>
> -----------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php