[Rd] predict.lm does not have a weights argument for newdata (PR#8877)

Thu May 18 17:14:34 CEST 2006

Full_Name: Johannes Ranke
Version: 2.3.0
OS: Linux-i386
Submission from: (NULL) (134.102.60.74)


In the case that predict.lm is used on an object resulting from weighted linear
regression with interval="prediction", the prediction intervals currently depend
on 
the absolute size of object$weights:

data(anscombe); attach(anscombe)
m1 <- lm(y1 ~ x1, w = rep(1,length(x1)))
m2 <- lm(y1 ~ x1, w = rep(3,length(x1)))
predict(m1, data.frame(x1 = 5), interval = "prediction")
predict(m2, data.frame(x1 = 5), interval = "prediction")

This make sense only if the weights are taken to be the numbers of replicates
used
for deriving the x values in newdata, and even then, the given prediction
interval
is only correct if the number of replicates for establishing the x values in
newdata is always one.

The desired behavior would be that the user gives a vector weights.newdata for
newdata, matching the weighting scheme applied for the weighted regression (e.g.
calculated from a variance function).

My education in statistics is only medium, so I am not sure if my proposed
solution is correct. Please check my patch to fix this:

--- lm.R.orig   2006-04-10 00:19:39.000000000 +0200
+++ lm.R        2006-05-18 17:10:29.000000000 +0200
@@ -591,7 +591,8 @@
     function(object, newdata, se.fit = FALSE, scale = NULL, df = Inf,
             interval = c("none", "confidence", "prediction"),
             level = .95,  type = c("response", "terms"),
-            terms = NULL, na.action = na.pass, ...)
+            terms = NULL, na.action = na.pass, ...,
+       weights.newdata = rep(1, length(newdata[[1]])))
 {
     tt <- terms(object)
     if(missing(newdata) || is.null(newdata)) {
@@ -630,6 +631,11 @@
                r <- object$residuals
                w <- object$weights
                rss <- sum(if(is.null(w)) r^2 else r^2 * w)
+    if(!is.null(w) && interval == "prediction" && weights.newdata ==
rep(1,length(newdata[[1]]))) {
+        warning(paste(
+          "To find prediction intervals from linear models with weights,",
+          "you probably want weights.newdata different from one."))
+    }
                df <- n - p
                rss/df
            } else scale^2
@@ -715,7 +721,7 @@
        tfrac <- qt((1 - level)/2, df)
        hwid <- tfrac * switch(interval,
                               confidence = sqrt(ip),
-                              prediction = sqrt(ip+res.var)
+                              prediction = sqrt(ip+res.var/weights.newdata)
                               )
        if(type != "terms") {
            predictor <- cbind(predictor, predictor + hwid %o% c(1, -1))

---------------------------end patch-------------------------------------------

if you apply this to lm.R, you get the same result from both lines:

predict(m1, data.frame(x1 = 5), interval = "prediction") 
predict(m2, data.frame(x1 = 5), interval = "prediction", weights.newdata = 3)

except that you get a warning if you set all newweights to one (default),
because this is probably not what you want for a prediction interval from
weighted regression.

All it does is to (down)scale the part of the variance that each new data point
contributes to the variance of the predicted y.