[R] mgcv, testing gamm vs lme, which degrees of freedom?

Fri Jun 18 19:17:07 CEST 2010

Just realized something: You should take into account that the LR test
is actually only valid for _nested_ models. Your models are not
nested. Hence, you shouldn't use the anova function to compare them,
and you shouldn't compare the df. In fact, if you're interested in the
contribution of a term, then using anova to compare the model with
that term and without that term gives you an answer on the hypothesis
whether that term with spline contributes significantly to the model.

> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML")

> f3 <- gamm(y ~ x, random = list(id=~1), method="ML" )

> f4 <- gamm(y ~ 1, random = list(id=~1), method="ML" )

> anova(f3$lme,f2$lme)
       Model df AIC BIC logLik   Test L.Ratio p-value
f3$lme     1  4 760 770   -376
f2$lme     2  5 381 394   -186 1 vs 2     380  <.0001

> anova(f4$lme,f2$lme)
       Model df AIC BIC logLik   Test L.Ratio p-value
f4$lme     1  3 945 953   -470
f2$lme     2  5 381 394   -186 1 vs 2     568  <.0001

> anova(f3$lme,f4$lme)
       Model df AIC BIC logLik   Test L.Ratio p-value
f3$lme     1  4 760 770   -376
f4$lme     2  3 945 953   -470 1 vs 2     188  <.0001

This is the correct application of a likelihood ratio test. You see
that adding the spline increases the df with 1 compared to the linear
model, as part of the spline gets into the random component. Notice as
well that the interpretation of a test in case of a random component
is not the same as in case of a fixed component. If I understood
correctly, this LR test specifically says something over the effect of
X, without being interested in the shape of the spline. The
"significance of a spline" is a difficult concept anyway, as a spline
can be seen as a form of local regression. It's exactly the use of the
randomization that allows for a general hypothesis about the added
value of the spline, without focusing on its actual shape. Hence the
"freedom" connected to that actual shape should not be used in the df
used to test the general hypothesis.

Hope this makes sense someway...

Cheers
Joris

On Fri, Jun 18, 2010 at 6:27 PM, Carlo Fezzi <c.fezzi at uea.ac.uk> wrote:
> Dear Simon,
>
> thanks a lot for your prompt reply.
>
> Unfortunately I am still confused about which is the correct way to test
> the two models... as you point out: why in my example the two models have
> the same degrees of freedom?
>
> Intuitively it seems to me the gamm model is more flexible since, as I
> understand also from you response, it should contain more random effects
> than the other model because some of the smooth function parameters are
> represented as such. This should not be taken into account when testing
> one model vs the other?
>
> Continuing with my example, the two models:
>
> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML")
> f3 <- gamm(y ~ x + I(x^2), random = list(id=~1), method="ML" )
>
> Can be tested with:
>
> anova(f3$lme,f2$lme)
>
> But why are the df the same? Model f2 appears to be more flexible and, as
> such, should have more (random) parameters. Should not a test of one model
> vs the other take this into account?
>
> Sorry if this may sound dull, many thanks for your help,
>
> Carlo
>
>
>
>> On Wednesday 16 June 2010 20:33, Carlo Fezzi wrote:
>>> Dear all,
>>>
>>> I am using the "mgcv" package by Simon Wood to estimate an additive
>>> mixed
>>> model in which I assume normal distribution for the residuals. I would
>>> like to test this model vs a standard parametric mixed model, such as
>>> the
>>> ones which are possible to estimate with "lme".
>>>
>>> Since the smoothing splines can be written as random effects, is it
>>> correct to use an (approximate) likelihood ratio test for this?
>> -- yes this is ok (subject to the usual caveats about testing on the
>> boundary
>> of the parameter space) but your 2 example models below will have  the
>> same
>> number of degrees of freedom!
>>
>>> If so,
>>> which is the correct number of degrees of freedom?
>> --- The edf from the lme object, if you are testing using the log
>> likelihood
>> returned by the  lme representation of the model.
>>
>>> Sometime the function
>>> LogLik() seems to provide strange results regarding the number of
>>> degrees
>>> of freedom (df) for the gam, for instance in the example I copied below
>>> the df for the "gamm" are equal to the ones for the "lme", but the
>>> summary(model.gam) seems to indicate a much higher edf for the gamm.
>> --- the edf for the lme representation of the model counts only the fixed
>> effects + the variance parameters (which includes smoothing parameters).
>> Each
>> smooth typically contributes only one or two fixed effect parameters, with
>> the rest of the coefficients for the smooth treated as random effects.
>>
>> --- the edf for the gam representation of the same model differs in that
>> it
>> also counts the *effective* number of parameters used to represent each
>> smooth: this includes contributions from all those coefficients that the
>> lme
>> representation treated as strictly random.
>>
>> best,
>> Simon
>>
>>
>>> I would be very grateful to anybody who could point out a solution,
>>>
>>> Best wishes,
>>>
>>> Carlo
>>>
>>> Example below:
>>>
>>> ----
>>>
>>> rm(list = ls())
>>> library(mgcv)
>>> library(nlme)
>>>
>>> set.seed(123)
>>>
>>> x  <- runif(100,1,10)                                # regressor
>>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)     # random intercept
>>> id <- rep(1:10, each=10)                     # identifier
>>>
>>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>>>
>>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  # lme
>>> model
>>>
>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm
>>>
>>> ## same number of "df" according to logLik:
>>> logLik(f1)
>>> logLik(f2$lme)
>>>
>>> ## much higher edf according to summary:
>>> summary(f2$gam)
>>>
>>> -----------
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html and provide commented,
>>> minimal,
>>> self-contained, reproducible code.
>>
>> --
>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
>>> +44 1225 386603  www.maths.bath.ac.uk/~sw283
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php