[R] mgcv, testing gamm vs lme, which degrees of freedom?

Fri Jun 18 18:17:28 CEST 2010

Seems like Simon answered your question already, but indeed, I think
it is correct. I raised the same question here at the department a
while ago, not believing it could actually give the correct results.
Yet, the "underestimation" of the degrees of freedom is
counterbalanced by the addition of the rest of the splines to the
random component of the variance-covariance structure. Again, I can't
prove it mathematically, but when looking upon it this way, it feels
intuitively correct to me.

Cheers
Joris

On Fri, Jun 18, 2010 at 4:24 PM, Carlo Fezzi <c.fezzi at uea.ac.uk> wrote:
> Hi Joris,
>
> thanks for your reply, sorry I was being sloppy regarding the random
> effect representation fo the smooth functions.
>
> I am using the identity link, but I am not sure if the anova command will
> work since the df are the same as in Loglik. For example, if you add the
> following code to my previous example:
>
>
> f3 <- gamm(y ~ x + I(x^2), random = list(id=~1), method="ML")
> anova(f3$lme,f2$lme)
>
> You will see that the number of degree of freedom of model f2 and f3 is
> the same. This seems very strange to me, do you think is correct?
>
> Best wishes,
>
> Carlo
>
>> I see this question is still open, so let me try to give you some
>> answer. As far as I understood, the smoothing splines are not used as
>> pure random effects, but as a combination of random and fixed effects.
>> Very simplified, the first two basis functions (intercept and linear
>> effect) are added as a fixed effect, whereas all other basis functions
>> go to the random component. Hence, the df for the log likelihood will
>> be indeed smaller than in the case of gam, where all splines are
>> considered fixed effects.
>>
>> I've struggled with this issue myself as well, and I still don't
>> understand it fully. I believe I have the concepts down by now, but
>> the mathematical background requires more study for me I'm afraid. Did
>> you read the book of Simon Wood already?
>>
>> http://www.amazon.ca/Generalized-Additive-Models-Introduction-R/dp/1584884746
>>
>> Simon Wood advises to compare models using the anova's for the
>> lme-part of the object, i.e.
>> anova(model1$lme, model2$lme).
>> where one model contains the smoothing term of interest and the other
>> doesn't. I base most of my model testing in the gamm context on these
>> tests. To be completely correct, this -apparently- only counts for
>> gamms using the identity link.
>>
>> Cheers
>> Joris
>>
>> On Wed, Jun 16, 2010 at 9:33 PM, Carlo Fezzi <c.fezzi at uea.ac.uk> wrote:
>>> Dear all,
>>>
>>> I am using the "mgcv" package by Simon Wood to estimate an additive
>>> mixed
>>> model in which I assume normal distribution for the residuals. I would
>>> like to test this model vs a standard parametric mixed model, such as
>>> the
>>> ones which are possible to estimate with "lme".
>>>
>>> Since the smoothing splines can be written as random effects, is it
>>> correct to use an (approximate) likelihood ratio test for this? If so,
>>> which is the correct number of degrees of freedom? Sometime the function
>>> LogLik() seems to provide strange results regarding the number of
>>> degrees
>>> of freedom (df) for the gam, for instance in the example I copied below
>>> the df for the "gamm" are equal to the ones for the "lme", but the
>>> summary(model.gam) seems to indicate a much higher edf for the gamm.
>>>
>>> I would be very grateful to anybody who could point out a solution,
>>>
>>> Best wishes,
>>>
>>> Carlo
>>>
>>> Example below:
>>>
>>> ----
>>>
>>> rm(list = ls())
>>> library(mgcv)
>>> library(nlme)
>>>
>>> set.seed(123)
>>>
>>> x  <- runif(100,1,10)                           # regressor
>>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)        # random intercept
>>> id <- rep(1:10, each=10)                        # identifier
>>>
>>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>>>
>>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  # lme
>>> model
>>>
>>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm
>>>
>>> ## same number of "df" according to logLik:
>>> logLik(f1)
>>> logLik(f2$lme)
>>>
>>> ## much higher edf according to summary:
>>> summary(f2$gam)
>>>
>>> -----------
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php