[R] Standard errors from predict.gam versus predict.lm

Simon Wood s.wood at bath.ac.uk
Fri Feb 17 16:51:26 CET 2012


Mike,

Isn't this just an example of the wrong model giving a spurious
impression of precision? or more accurately, precision at the expense of
accuracy?

Here's a linear model example of the same thing...

set.seed(1)
n <- 400
x <- runif(n)-.5
y <- 2+ x*.2+ x^2 + rnorm(n)*.5
m1 <- lm(y~1)
m2 <- lm(y~x+I(x^2))
mean(predict(m1,se=TRUE)$se.fit)
#[1] 0.02641367
mean(predict(m2,se=TRUE)$se.fit)
#[1] 0.04363921

... so the wrong model (m1, a constant) gives much lower se than the
correct model (m2, a quadratic).

best,
Simon

On 17/02/12 14:57, Dunbar, Michael J. wrote:
>
> I've got a small problem.
>
> I have some observational data (environmental samples: abiotic
> explanatory variable and biological response) to which I've fitted
> both a multiple linear regression model and also a gam (mgcv) using
> smooths for each term. The gam clearly fits far better than the lm
> model based on AIC (difference in AIC ~ 8), in addition the adjusted
> R squared for the gam is clearly better.
>
> I then want to make some plots of predicted values from both models
> with confidence intervals. So I can get my predictions usng
> predict(my.model(....se.fit=TRUE))
>
> My problem is that this results in the prediction se's (and
> consequent CI's for the mean prediction) being much wider for the gam
> than for the linear model. This seems rather counter-intuitive given
> that the gam appears to fit better, and hence I will find it hard to
> explain my choice of a gam model in a journal article, despite clear
> non-linearity.
>
> It's not so easy for me to post my own example. The following code
> gives a flavour, clearly in this instance the gam will fit MUCH
> better because it is the generating model. Even in this case, most
> gam se's line above the 1:1 line. In my example, with some
> observational data, the difference between linear and gam fit is not
> so pronounced but the gam still clearly fits better than linear, but
> all gam se's are WAY above their linear equivalents when used to
> predict for representative new data in order to present results as
> interaction plots.
>
>
>
> cheers Mike
>
> require(mgcv) require(MASS)
>
> dat<- gamSim(1,n=200,dist="normal",scale=2) summary(b<-
> gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)) summary(a<- lm(y~x0 + x1 +
> x2 + x3,data=dat))
>
> se.result<- data.frame(linear.se=predict(a, se.fit=TRUE)$se.fit,
> gam.se=predict(b, se.fit=TRUE)$se.fit) with(se.result,
> eqscplot(linear.se, gam.se)) abline(a=0, b=1)


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list