[R] Error in predict.lm() for models without intercept?

Rolf Turner r.turner at auckland.ac.nz
Wed Sep 17 09:06:49 CEST 2014


What's going on is very simple.  I apologize for not getting my thoughts 
organized from the start.

Confidence bands for mu_y (the expected value of y) will have straight 
line edges if and only if there is a *single predictor* in the model.

If there are two or more predictors the edges will be curved.  It 
doesn't really matter if one of the predictors is constant or not.

The standard error of the estimate of mu_y is

     s * sqrt(t(x)%*%H%*%x)

where x is the value of the covariates at which y is being predicted and
H = (t(X)%*%X)^{-1} --- where in turn X is the design matrix.

It there is a single predictor "x" then X is n x 1 and H is 1 x 1, i.e. 
a scalar, say h.  Thus the standard error is

    s*sqrt(h)*|x|

and you get the proportionality to the absolute value of x.

If there are two predictors, x_1 and x_2 then the standard error has the
form

     s*sqrt(a*x_1^2 + 2*b*x_1*x_2 + c*x_2^2)

i.e. s times the square root of a quadratic form.  This will *not* be a 
straight line function of x_1 when x_2 is held constant, and vice versa.

Thus your expectation of seeing straight line edges to confidence bands 
was misguided.  This happens only when there is just *one* predictor.

Again I'm sorry I didn't get my head together about this earlier and 
save you sending me your data set. The issue has nothing whatever to do 
with your particular data set --- it's just basic mathematics.

There is nothing at all wrong with predict.lm().

cheers,

Rolf


P.S.  This is not an R matter at all, so if you have any further 
questions you should email me off-list.

R.

On 17/09/14 15:58, isabella at ghement.ca wrote:

> Hi Rolf,
>
> Thanks very much for your response.   You are right - my simulated
> example works as intended, so it can't be used to get to the bottom of
> this problem (if it is a problem at all).
>
> Here is another example, which is the one I actually worked with when I
> thought maybe something is not quite right in the universe.
>
> The example is based on a real data set (please keep it confidential),
> which is attached to this e-mail as *mod.data.csv*.   This data set
> includes terminal run numbers for salmon, recorded at Age_5, Age_4,
> Age_3 and Age_2.  A model of the form lm(Age_5 ~ 0 + Age_4 + Age_3 +
> Age_2) is fitted to these data and the goal is to visualize the effects
> of Age_4, Age_3 and Age_2 on Age_5.  For biological reasons, this model
> is supposed to not have an intercept.
>
> The attachment Effect_1.pdf shows what these effects look like.  If the
> model has no intercept, should the confidence bands still flare up
> as one moves away from the value of the predictor whose effect we care
> about?
>
> The attachment Effect_2.pdf replicates the effects plots but this time
> using the effects package.
>
> If predict() is correct, should we expect from statistical theory to see
> that the confidence bands have this particular behaviour?  Intuitively,
> I would have expected them to look like a fan plot that starts out at
> zero and then flares up as we move away from zero.
>
> Here is the R code I used to create the two attached plots (with R x64
> 3.1.0).  In this code, Age_5 becomes y, Age_4 becomes x, Age_3 becomes z
> and Age_2 becomes v.
>
> ## read mod.data into R
>
> mod.data <- read.csv("mod.data.csv")
>
> ## compute confidence bands (i.e., fitted values +/- 1.96 standard
> errors of fitted values)
>
> y <- mod.data$Age_5
> x <- mod.data$Age_4
> z <- mod.data$Age_3
> v <- mod.data$Age_2
>
> mod <- lm(y ~ 0 + x)
>
> conf.band.x <- predict(mod,newdata=data.frame(x =
> seq(from=0,to=max(x),by=100)),
>                 interval="confidence")
>
> ## display confidence bands
>
> conf.band.x <- data.frame(lwr=conf.band.x[,"lwr"],
>                            fit=conf.band.x[,"fit"],
>                            upr=conf.band.x[,"upr"])
>
> matplot(x=seq(from=0,to=floor(max(x)),by=100), y=conf.band.x, type="l",
> xlab="x", ylab="y")
> ## abline(v=0,lty=3,col="magenta")
> title("Effect of x on y")
>
> ## linear model with no intercept but with three predictors
>
> par(mfrow=c(2,2))
>
> mod <- lm(y ~ 0 + x + z + v)
>
> ## effect of x on y
>
> conf.band.x <- predict(mod,newdata=data.frame(x =
> seq(from=0,to=max(x),by=100),
>               z = mean(z),
>               v=mean(v)),
>              interval="confidence")
>
> conf.band.x <- data.frame(lwr=conf.band.x[,"lwr"],
>   fit=conf.band.x[,"fit"],
> upr=conf.band.x[,"upr"])
>
> matplot(x=seq(from=0,to=max(x),by=100), y=conf.band.x, type="l",
> xlab="x", ylab="y")
> abline(v=mean(x),lty=3,col="magenta")
> title("Effect of x on y (obtained by setting z and v to their average
> levels)")
>
>
> ## effect of z on y
> conf.band.z <- predict(mod,newdata=data.frame(z = seq(0,to=max(z),by=100),
>               x = mean(x),v=mean(v)),
>              interval="confidence")
>
> conf.band.z <- data.frame(lwr=conf.band.z[,"lwr"],
>   fit=conf.band.z[,"fit"],
> upr=conf.band.z[,"upr"])
>
> matplot(seq(from=0,to=max(z),by=100), y=conf.band.z, type="l", xlab="z",
> ylab="y")
> abline(v=mean(z),lty=3,col="magenta")
> title("Effect of z on y (obtained by setting x and v to their average
> levels)")
>
>
> ## effect of v on y
> conf.band.v <- predict(mod,newdata=data.frame(v = seq(0,to=max(v),by=100),
>               x = mean(x),z=mean(z)),
>              interval="confidence")
>
> conf.band.v <- data.frame(lwr=conf.band.v[,"lwr"],
>   fit=conf.band.v[,"fit"],
>   upr=conf.band.v[,"upr"])
>
> matplot(seq(from=0,to=max(v),by=100), y=conf.band.v, type="l", xlab="v",
> ylab="y")
> abline(v=mean(v),lty=3,col="magenta")
> title("Effect of v on y (obtained by setting x and z to their average
> levels)")
>
> ###
> ### Replicate effect plots using effects package
> ###
>
> require(effects)
> plot(allEffects(mod))
>
> Maybe my intuition is playing tricks on me - if you have any idea as to
> what may be going on here, please let me know.



-- 
Rolf Turner
Technical Editor ANZJS



More information about the R-help mailing list