[Rd] predict.lm does not have a weights argument for newdata (PR#8877)

jranke at uni-bremen.de jranke at uni-bremen.de
Thu May 18 17:14:34 CEST 2006


Full_Name: Johannes Ranke
Version: 2.3.0
OS: Linux-i386
Submission from: (NULL) (134.102.60.74)


In the case that predict.lm is used on an object resulting from weighted linear
regression with interval="prediction", the prediction intervals currently depend
on 
the absolute size of object$weights:

data(anscombe); attach(anscombe)
m1 <- lm(y1 ~ x1, w = rep(1,length(x1)))
m2 <- lm(y1 ~ x1, w = rep(3,length(x1)))
predict(m1, data.frame(x1 = 5), interval = "prediction")
predict(m2, data.frame(x1 = 5), interval = "prediction")

This make sense only if the weights are taken to be the numbers of replicates
used
for deriving the x values in newdata, and even then, the given prediction
interval
is only correct if the number of replicates for establishing the x values in
newdata is always one.

The desired behavior would be that the user gives a vector weights.newdata for
newdata, matching the weighting scheme applied for the weighted regression (e.g.
calculated from a variance function).

My education in statistics is only medium, so I am not sure if my proposed
solution is correct. Please check my patch to fix this:

--- lm.R.orig   2006-04-10 00:19:39.000000000 +0200
+++ lm.R        2006-05-18 17:10:29.000000000 +0200
@@ -591,7 +591,8 @@
     function(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
             interval = c("none", "confidence", "prediction"),
             level = .95,  type = c("response", "terms"),
-            terms = NULL, na.action = na.pass, ...)
+            terms = NULL, na.action = na.pass, ...,
+       weights.newdata = rep(1, length(newdata[[1]])))
 {
     tt <- terms(object)
     if(missing(newdata) || is.null(newdata)) {
@@ -630,6 +631,11 @@
                r <- object$residuals
                w <- object$weights
                rss <- sum(if(is.null(w)) r^2 else r^2 * w)
+    if(!is.null(w) && interval == "prediction" && weights.newdata ==
rep(1,length(newdata[[1]]))) {
+        warning(paste(
+          "To find prediction intervals from linear models with weights,",
+          "you probably want weights.newdata different from one."))
+    }
                df <- n - p
                rss/df
            } else scale^2
@@ -715,7 +721,7 @@
        tfrac <- qt((1 - level)/2, df)
        hwid <- tfrac * switch(interval,
                               confidence = sqrt(ip),
-                              prediction = sqrt(ip+res.var)
+                              prediction = sqrt(ip+res.var/weights.newdata)
                               )
        if(type != "terms") {
            predictor <- cbind(predictor, predictor + hwid %o% c(1, -1))

---------------------------end patch-------------------------------------------

if you apply this to lm.R, you get the same result from both lines:

predict(m1, data.frame(x1 = 5), interval = "prediction") 
predict(m2, data.frame(x1 = 5), interval = "prediction", weights.newdata = 3)

except that you get a warning if you set all newweights to one (default),
because this is probably not what you want for a prediction interval from
weighted regression.

All it does is to (down)scale the part of the variance that each new data point
contributes to the variance of the predicted y.



More information about the R-devel mailing list