[R] compare gam fits

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Aug 17 21:34:49 CEST 2010


On Thu, 2010-08-05 at 13:49 -0300, Mike Lawrence wrote:
> Hi folks,

I've been pondering this for a little while, drumming up courage to
reply, whilst I've been away on fieldwork with patchy email access... so
here goes with the neck sticking out thing... ;-)

> I originally tried R-SIG-Mixed-Models for this one
> (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004170.html),
> but I think that the final steps to a solution aren't mixed-model
> specific, so I thought I'd ask my final questions here.

Two things spring to mind:

1) I don't know enough to know if these models are nested but there
should be a way to make them nested...

Anyway, why not fit a model with a 'by' term and model without the 'by'
term and compare them using a likelihood ratio test? Something like;

m1 <- gamm(y ~ B + s(C, by = B), data = foo,
           random = list(A = ~ 1), method = "ML", ....)
m2 <- gamm(y ~ s(C), data = foo, random = list(A = ~ 1),
           method = "ML", ....)

anova(m1$lme, m2$lme)

With s() smooths these won't be truly nested but you could also use te()
smooths, but I don't know the specifics of how you can ensure these
models are strictly nested. Simon Wood may be able to help you if you go
down this route.

2) Do the whole thing in gam() which can also deal with simple random
effects. You'll need to look at ?gam.models to see how to set these
things up, but at least you won't have to deal with potential problems
of fitting these things via nlme::lme. You'd still need to work out the
nesting issue and then fit the models with the same formulas as above,
but without the random bit.

> I used gamm4 to fit a generalized additive mixed model to data from a
> AxBxC design, where A is a random effect (human participants in an
> experiment), B is a 2-level factor predictor variable, and C is a
> continuous variable that is likely non-linear. I tell gamm4 to fit a
> smooth across C to each level of B independently, and I can use
> predict.gam(...,se.fit=T) to obtain predictions from the fitted model
> as well as the standard error for the prediction. I'd like to
> visualize the BxC interaction to see if smoothing C within each level
> of B was really necessary, and if so, where it is (along the C
> dimension) that B affects the smooth. It's easy enough to obtain the
> predicted B1-B2 difference function, but I'm stuck on how to convey
> the uncertainty of this function (e.g. computing the confidence
> interval of the difference at each value of C).
> 
> One thought is that predict.gam(...,se.fit=T) returns SE values, so if
> I could find out the N on which these SE values are computed, I could
> compute the difference CI as
> sqrt( ( (SD_B1)^2 + (SD_B2)^2 ) / N ) * qt( .975, df=N-1 )
> 
> However, I can't seem to figure out what value of N was used to
> compute the SEs that predict.gam(...,se.fit=T) produces. Can anyone
> point me to where I might find N?
> 
> Further, is N-1 the proper df for the call to qt()?
> 
> Finally, with a smooth function and 95% confidence intervals computed
> at each of a large number of points, don't I run into a problem of an
> inflated Type I error rate? Or does the fact that each point is not
> independent from those next to it make this an inappropriate concern?

IIRC, yes. These will be pointwise confidence intervals and won't hold
across the whole fitted smooth at the 95% level. Simon's book
demonstrates two methods that could be used to generate simultaneous
confidence intervals via simulation from the fitted models. The first
involves simulating from a multivariate normal distribution with
parameters given by the coefficients of the model. This will yield
confidence intervals that are conditional upon the smoothing parameter
(i.e. they will be a bit narrow as they don't take into account the fact
that the smoothing parameter was estimated and not known a priori). To
get round this, a parametric bootstrap can be used to simulate from the
model. An example for a Poisson model is given in Simon's book for the
mackerel egg example.

In the back of my mind I recall reading something that is niggling me
with regard to your approach and how you interpret these confidence
intervals (perhaps this is just the pointwise ones...?). But I am not
100% sure that you can do the sort of comparison you want by evaluating
if the curves are different by virtue of being outside the confidence
intervals...?

Anyway, I think the LRT approach and 'by' smooths might be a better way
to go.

Apologies if you've considered this by the way; I've not followed all
the threads on SIG-Mixed.

HTH

G

> 
> Cheers,
> 
> Mike
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list