[R] Smooth terms significance in GAM models

Fri Sep 30 15:40:42 CEST 2005

> i'm using gam() function from package mgcv with default option (edf
> estimated by GCV).
>
> >G=gam(y ~ s(x0, k = 5) + s(x1) + s(x2, k = 3))
> >SG=summary(G)
> Formula:
> y ~ +s(x0, k = 5) + s(x1) + s(x2, k = 3)
>
> Parametric coefficients:
>               Estimate  std. err.    t ratio    Pr(>|t|)
> (Intercept)  3.462e+07  1.965e+05      176.2    < 2.22e-16
>
> Approximate significance of smooth terms:
>                edf         chi.sq     p-value
>  s(x0)      2.858       70.629     1.3129e-07
>  s(x1)      8.922       390.39     2.6545e-13
>  s(x2)      1.571        141.6     1.8150e-11
>
> R-sq.(adj) =  0.955   Deviance explained =   97%
> GCV score = 2.4081e+12   Scale est. = 1.5441e+12  n = 40
> --------------------------------------
>
> I know i can estimate the significance of smooth terms with chi.sq &
> p.value.
>
> With GCV, p-value are obtained by comparing the statistic to an F
> distribution,isn't it?
> help(summary.gam) says "use at your own risk!".Does it mean i should
> only estimated signifiance of smooth terms by chi.sq?.Is there a way to
> link both information (p.value and chi.sq)?
No, using F as the reference distribution is always more conservative:
using chi.sq will be even worse. The p values are *very approximate* since
they are based on pretending that a penalized fit is equivalent to an
unpenalized fit with the same effective degrees of freedom, and neglect
the uncertainty associated with smoothing parameter estimation... they
provide a reasonable `rough guide' to significance, but are by no means
exact.

> Last question, using GAM with default, should i look at R-sq rather than
> Deviance explain, or both?

In this case devaince explained is just the unadjusted r^2... I'd look at
the r^2, which is adjusted (to take into account the degrees of freedom
`used up' when estimating the model).

best,
Simon