[R] mgcv: inclusion of random intercept in model - based on p-value of smooth or anova?
s.wood at bath.ac.uk
Thu Jul 19 11:23:50 CEST 2012
I've just submitted mgcv_1.7-19 to CRAN, which includes a major upgrade
of the p-value computation for random effect terms (and any other smooth
term which can be penalized to zero as part of estimation). The new
p-values are still conditional on the smoothing parameter/variance
component estimates, but given those estimates are based on the null
distribution of the restricted likelihood ratio statistic. Basically, if
you are going to condition on the smoothing parameters, then this is the
right thing to do, but if smoothing parameter estimates are very
uncertain, then the p-values may be underestimates (in simulations the
underestimation seems to be rather modest).
In the Gaussian mixed additive model case, the alternative is to use the
RLRsim package, while for GLMMs the only way to further improve things
is by writing code to simulate for the null distribution of the test
Thanks for pushing be into action on this!
On 25/06/12 16:31, Simon Wood wrote:
> I had a nagging feeling that there must be more to this than my previous
> reply suggested, and have been digging a bit further. Basically I would
> expect these p-values to not be great if you had nested random effects
> (such as main effects + their interaction), but your case looked rather
> After some digging it turns out that the p-value for Placename is wrong
> in this case, as you suspected. R routine `qr' has a `tol' argument that
> by default is set to 1e-7. In the computation of the test statistic for
> "re" terms I had called qr without changing this default to the value 0
> that is actually appropriate (I should have noticed this, I've been
> caught out by it before). With the correct 'tol', Placement is
> non-significant. This will be fixed in the next release, but that won't
> happen until I've tested the random effects p-value approximations a bit
> more thoroughly.
> On 11/06/12 14:56, Martijn Wieling wrote:
>> Dear Simon,
>> I ran an additional analysis using bam (mgcv 1.7-17) with three random
>> intercepts and no non-linearities, and compared these to the results
>> of lmer (lme4). Using bam results in a significant random intercept
>> (even though it has a very low edf-value), while the lmer results show
>> no variance associated to the random intercept of Placename. Should I
>> drop the random intercept of Placename and if so, how is this apparent
>> from the results of bam?
>> Summaries of both models are shown below.
>> With kind regards,
>> #### l1 = lmer(RefPMIdistMeanLog.c ~ geogamfit + (1|Word) + (1|Key) +
>> (1|Placename), data=wrddst); print(l1,cor=F)
>> Linear mixed model fit by REML
>> Formula: RefPMIdistMeanLog.c ~ geogamfit + (1 | Word) + (1 | Key) + (1
>> | Placename)
>> Data: wrddst
>> AIC BIC logLik deviance REMLdev
>> -44985 -44927 22498 -45009 -44997
>> Random effects:
>> Groups Name Variance Std.Dev.
>> Word (Intercept) 0.0800944 0.283009
>> Key (Intercept) 0.0013641 0.036933
>> Placename (Intercept) 0.0000000 0.000000
>> Residual 0.0381774 0.195390
>> Number of obs: 112608, groups: Word, 357; Key, 320; Placename, 40
>> Fixed effects:
>> Estimate Std. Error t value
>> (Intercept) -0.00342 0.01513 -0.23
>> geogamfit 0.99249 0.02612 37.99
>> #### m1 = bam(RefPMIdistMeanLog.c ~ geogamfit + s(Word,bs="re") +
>> s(Key,bs="re") + s(Placename,bs="re"), data=wrddst,method="REML");
>> Family: gaussian
>> Link function: identity
>> RefPMIdistMeanLog.c ~ geogamfit + s(Word, bs = "re") + s(Key,
>> bs = "re") + s(Placename, bs = "re")
>> Parametric coefficients:
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) -0.00342 0.01513 -0.226 0.821
>> geogamfit 0.99249 0.02612 37.991<2e-16 ***
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>> Approximate significance of smooth terms:
>> edf Ref.df F p-value
>> s(Word) 3.554e+02 347 634.716<2e-16 ***
>> s(Key) 2.946e+02 316 23.054<2e-16 ***
>> s(Placename) 1.489e-04 38 7.282<2e-16 ***
>> Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1
>> R-sq.(adj) = 0.693 Deviance explained = 69.4%
>> REML score = -22498 Scale est. = 0.038177 n = 112608
>> On Wed, May 23, 2012 at 11:30 AM, Simon Wood-4 [via R]
>> <ml-node+s789695n4631060h52 at n4.nabble.com> wrote:
>>> Having looked at this further, I've made some changes in mgcv_1.7-17 to
>>> the p-value computations for terms that can be penalized to zero during
>>> fitting (e.g. s(x,bs="re"), s(x,m=1) etc).
>>> The Wald statistic based p-values from summary.gam and anova.gam (i.e.
>>> what you get from e.g. anova(a) where a is a fitted gam object) are
>>> quite well founded for smooth terms that are non-zero under full
>>> penalization (e.g. a cubic spline is a straight line under full
>>> penalization). For such smooths, an extension of Nychka's (1988) result
>>> on CI's for splines gives a well founded distributional result on which
>>> to base a Wald statistic. However, the Nychka result requires the
>>> smoothing bias to be substantially less than the smoothing estimator
>>> variance, and this will often not be the case if smoothing can actually
>>> penalize a term to zero (to understand why, see argument in appendix of
>>> Marra& Wood, 2012, Scandinavian Journal of Statistics, 39,53-74).
>>> Simulation testing shows that this theoretical concern has serious
>>> practical consequences. So for terms that can be penalized to zero,
>>> alternative approximations have to be used, and these are now
>>> implemented in mgcv_1.7-17 (see ?summary.gam).
>>> The approximate test performed by anova(a,b) (a and b are fitted "gam"
>>> objects) is less well founded. It is a reasonable approximation when
>>> each smooth term in the models could in principle be well approximated
>>> by an unpenalized term of rank approximately equal to the edf of the
>>> smooth term, but otherwise the p-values produced are likely to be much
>>> too small. In particular simulation testing suggests that the test is
>>> not to be trusted with s(...,bs="re") terms, and can be poor if the
>>> models being compared involve any terms that can be penalized to zero
>>> during fitting. (Although the mechanisms are a little different, this is
>>> similar to the problem we would have if the models were viewed as
>>> regular mixed models and we tried to use a GLRT to test variance
>>> components for equality to zero).
>>> These issues are now documented in ?anova.gam and ?summary.gam...
>>> On 08/05/12 15:01, Martijn Wieling wrote:
>>>> Dear useRs,
>>>> I am using mgcv version 1.7-16. When I create a model with a few
>>>> non-linear terms and a random intercept for (in my case) country using
>>>> s(Country,bs="re"), the representative line in my model (i.e.
>>>> approximate significance of smooth terms) for the random intercept
>>>> edf Ref.df F p-value
>>>> s(Country) 36.127 58.551 0.644 0.982
>>>> Can I interpret this as there being no support for a random intercept
>>>> for country? However, when I compare the simpler model to the model
>>>> including the random intercept, the latter appears to be a significant
>>>> Model 1: ....
>>>> Model 2: .... + s(BirthNation, bs="re")
>>>> Resid. Df Resid. Dev Df Deviance F Pr(>F)
>>>> 1 789.44 416.54
>>>> 2 753.15 373.54 36.292 43.003 2.3891 1.225e-05 ***
>>>> I hope somebody could help me in how I should proceed in these
>>>> situations. Do I include the random intercept or not?
>>>> I also have a related question. When I used to create a mixed-effects
>>>> regression model using lmer and included e.g., an interaction in the
>>>> fixed-effects structure, I would test if the inclusion of this
>>>> interaction was warranted using anova(lmer1,lmer2). It then would show
>>>> me that I invested 1 additional df and the resulting (possibly
>>>> significant) improvement in fit of my model.
>>>> This approach does not seem to work when using gam. In this case an
>>>> apparent investment of 1 degree of freedom for the interaction, might
>>>> result in an actual decrease of the degrees of freedom invested by the
>>>> total model (caused by a decrease of the edf's of splines in the model
>>>> with the interaction). In this case, how would I proceed in
>>>> determining if the model including the interaction term is better?
>>>> With kind regards,
>>>> Martijn Wieling
>>>> Martijn Wieling
>>>> [hidden email]
>>>> University of Groningen
>>>> [hidden email] mailing list
>>>> PLEASE do read the posting guide
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>>> +44 (0)1225 386603 http://people.bath.ac.uk/sw283
>>> [hidden email] mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
>>> If you reply to this email, your message will be added to the discussion
>>> To unsubscribe from mgcv: inclusion of random intercept in model -
>>> based on
>>> p-value of smooth or anova?, click here.
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603 http://people.bath.ac.uk/sw283
More information about the R-help