[R] help interpreting output?

Tue Jan 7 17:41:02 CET 2003

The second model has encountered numerical problems, but I guess
you didn't need me to tell you that! Usually this results from model
identifiability problems, for example if one predictor variable is a
simple transformation of another (or in the *generalized* case if the
linear predictor ceases to uniquely determine the fitted values, as can
happen if the fitted values are essentially zero over a wide region of
the covariate space and a log link is used). mgcv does some simple checks
to try and catch the most usual ways in which models can run into these
difficulties (for example by specifing a higher smoothing basis dimension 
than can be supported by the number of unique covariate combinations), but
there's no way of catching all such problems.      

I assume that the second model came with warnings that the termwise edf's
are unreliable - the calculation of the estimated degrees of freedom for
each smooth is not as numerically stable as the actual model fitting, so
models which are somewhat unstable can fit without problems but then cause
problems when calculating diagnostics.... 

More generally I'd be a bit nervous about trying to estimate 5 or 6 smooth
terms and their degrees of freedom from 60 data (but I don't think that
this is the cause of the numerical problems).

If you can't spot an obvious identifiability issue, please let me know in
case it's a bug.

Simon Wood

> 
> > Dear R experts,
> >
> > I'm hoping someone can help me to interpret the results of building
> > gam's with mgcv in R.
> >
> > Below are summaries of two gam's based on the same dataset.  The first
> > gam (named "gam.mod") has six predictor variables.  The second gam
> > (named "gam.mod2") is exactly the same except it is missing one of the
> > predictor variables.  What is confusing me is the estimated defrees of
> > freedom for each of the splines in the second model....
> >
> > ________________
> >
> >  > summary.gam(mod.gam)
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > INT ~ s(IGS) + s(L2E) + s(TED) + s(PSD) + s(OPD) + s(GED)
> >
> > Parametric coefficients:
> >            Estimate  std. err.    t ratio    Pr(>|t|)
> > constant     302.32      5.192      58.23    < 2.22e-16
> >
> > Approximate significance of smooth terms:
> >               edf       chi.sq     p-value
> > s(IGS)      4.254       58.308     9.5524e-12
> > s(L2E)          1       8.7673     0.0030668
> > s(TED)          1       8.3915     0.0037697
> > s(PSD)          1       6.0234     0.014118
> > s(OPD)      2.289       12.745     0.0024349
> > s(GED)      3.791       152.68     < 2.22e-16
> >
> > R-sq.(adj) = 0.885   Deviance explained = 91.1%
> > GCV score = 2124.9   Scale est. = 1617.3    n = 60
> >
> > ________________
> >
> >  >summary.gam(mod.gam2)
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > INT ~ s(IGS) + s(L2E) + s(TED) + s(PSD) + s(OPD)
> >
> > Parametric coefficients:
> >            Estimate  std. err.    t ratio    Pr(>|t|)
> > constant     302.32  4.736e-14  6.384e+15    < 2.22e-16
> >
> > Approximate significance of smooth terms:
> >               edf       chi.sq     p-value
> > s(IGS)  1.757e-05   1.3524e+09     < 2.22e-16
> > s(L2E)   0.009991      0.21394     0.6437
> > s(TED)  2.945e-05   1.4913e+07     < 2.22e-16
> > s(PSD)  2.566e-05   6.5495e+06     < 2.22e-16
> > s(OPD)  5.023e-05   3.2332e+07     < 2.22e-16
> >
> > R-sq.(adj) = 0.645   Deviance explained = 64.5%
> > GCV score = 7489.7   Scale est. = 6069.7    n = 60
> >
> >
> > ________________
> >
> >
> > Any suggestions about either (1) what went wrong with the second model?
> >  or (2) how the heck do I interpet these results?
> >
> > Thanks,
> >
> > Mike.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>