[R] Sampling from multivariate multiple regression prediction regions

Iain Pardoe ipardoe at lcbmail.uoregon.edu
Mon May 9 18:43:36 CEST 2005


I'd like to sample multiple response values from a multivariate
regression fit.  For example, suppose I have m=2 responses (y1,y2) and a
single set of predictor variables (z1,z2).  Each response is assumed to
follow its own regression model, and the error terms in each model can
be correlated (as in example 7.10 of section 7.7 of Johnson/Wichern):

> ex7.10 <-
+   data.frame(y1 = c(141.5, 168.9, 154.8, 146.5, 172.8, 160.1, 108.5),
+              y2 = c(301.8, 396.1, 328.2, 307.4, 362.4, 369.5, 229.1),
+              z1 = c(123.5, 146.1, 133.9, 128.5, 151.5, 136.2, 92),
+              z2 = c(2.108, 9.213, 1.905, .815, 1.061, 8.603, 1.125))
> attach(ex7.10)
> f.mlm <- lm(cbind(y1,y2)~z1+z2)
> y.hat <- c(1, 130, 7.5) %*% coef(f.mlm)
> round(y.hat, 2)
         y1     y2
[1,] 151.84 349.63
> qf.z <- t(c(1, 130, 7.5)) %*%
+   solve(t(cbind(1,z1,z2)) %*% cbind(1,z1,z2)) %*%
+   c(1, 130, 7.5)
> round(qf.z, 5)
        [,1]
[1,] 0.36995
> n.sigma.hat <- SSD(f.mlm)$SSD # same as t(resid(f.mlm)) %*%
resid(f.mlm)
> round(n.sigma.hat, 2)
     y1    y2
y1 5.80  5.22
y2 5.22 12.57
> F.quant <- qf(.95,2,3)
> round(F.quant, 2)
[1] 9.55

This gives me all the information I need to calculate a 95% confidence
ellipse for y=(y1,y2) at (z1,z2)=(130,7.5) using JW's equation (7-48)
(written using R syntax, although R cannot "literally" calculate this as
it is written):

(y-y.hat) %*% ((n-r-1) * solve(n.sigma.hat)) %*% t(y-y.hat)
<= (1+qf.z) * (m*(n-r-1)/(n-r-m)) * F.quant

But, what if instead I'd like to sample (y1,y2) values from this
distribution?  I can sample from an F(m,n-r-m) distribution easily
enough, but then how can I transform this to a single point in (y1,y2)
space?

Any ideas would be gratefully appreciated.  Thanks.

Iain Pardoe <ipardoe at lcbmail.uoregon.edu>
University of Oregon




More information about the R-help mailing list