[Rd] Problems with predict.lm: incorrect SE estimate (PR#7772)

msa at biostat.mgh.harvard.edu msa at biostat.mgh.harvard.edu
Mon Apr 4 19:07:17 CEST 2005


Thanks!

I did not realize that predict.lm calculates the standard
error of the prediction (which goes asumptotically to zero)
rather then prediction error.

The formula for standard erro should omits leding 1 under
square root and with delta method, one should omit rms^2
component. Then everything agrees nicely with predict.lm
output.

Sorry about bothering, I did not get a good hit this time.

Marek Ancukiewicz

> From: Duncan Murdoch <murdoch at stats.uwo.ca>
> Cc: R-bugs at biostat.ku.dk
> Date: Mon, 04 Apr 2005 12:19:32 -0400
> 
> On Mon,  4 Apr 2005 18:01:05 +0200 (CEST), msa at biostat.mgh.harvard.edu
> wrote :
> 
> >Full_Name: Marek Ancukiewicz
> >Version: 2.01
> >OS: Linux
> >Submission from: (NULL) (132.183.12.87)
> >
> >
> >It seems that the the standard error of prediction of the linear regression,
> >caclulated with predict.lm is incorrect. Consider the following example where
> >the standard error is first calculated with predict.lm, then using delta
> >method. and finally, using the formula rms*sqrt(1+1/n+(xp-x0)^2/Sxx). 
> 
> Your formula is incorrect.  You've got the formula for the so called
> "prediction error" (i.e. the stddev of the difference between the
> prediction and a new observation) rather than the "standard error"
> (i.e. the stddev of the prediction).
> 
> Duncan Murdoch
> >
> >Marek Ancukiewicz
> >
> >> n <- 10
> >> x <- 1:n
> >> y <- x
> >> y[c(2,4,6)] <- y[c(2,4,6)] + 0.1
> >> y[c(3,5,7)] <- y[c(3,5,7)] - 0.1
> >> a <- lm(y~x)
> >> rms <- sqrt(sum(a$residuals^2)/(n-2))
> >> s <- covmat(a)*rms^2
> >> xp <- 3
> >> xm <- mean(x)
> >> summary(a)
> >
> >Call:
> >lm(formula = y ~ x)
> >
> >Residuals:
> >     Min       1Q   Median       3Q      Max 
> >-0.10909 -0.07500  0.01091  0.06955  0.10182 
> >
> >Coefficients:
> >            Estimate Std. Error t value Pr(>|t|)    
> >(Intercept) 0.020000   0.058621   0.341    0.742    
> >x           0.996364   0.009448 105.463  7.3e-14 ***
> >---
> >Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> >
> >Residual standard error: 0.08581 on 8 degrees of freedom
> >Multiple R-Squared: 0.9993,     Adjusted R-squared: 0.9992 
> >F-statistic: 1.112e+04 on 1 and 8 DF,  p-value: 7.3e-14 
> >
> >> print(predict(a,new=data.frame(x=xp),se.fit=T))
> >$fit
> >[1] 3.009091
> >
> >$se.fit
> >[1] 0.0359752
> >
> >$df
> >[1] 8
> >
> >$residual.scale
> >[1] 0.08581163
> >
> >> print(se.delta.method <- sqrt(s[1,1]+xp^2*s[2,2]+2*xp*s[1,2] + rms^2))
> >[1] 0.09304758
> >> print(se.ss.formula <- rms*sqrt(1+1/n+(xp-xm)^2/sum((x-xm)^2)))
> >[1] 0.09304758
> >
> >______________________________________________
> >R-devel at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list