[R] solution to a regression with multiple independent variable

Charles C. Berry cberry at tajo.ucsd.edu
Sun Nov 5 19:22:12 CET 2006


On Sun, 5 Nov 2006, John Sorkin wrote:

> Please forgive a statistics question.
> I know that a simple bivariate linear regression, y=f(x) or in R
> parlance lm(y~x) can be solved using the variance-covariance matrix:
> beta(x)=covariance(x,y)/variance(x). I also know that a linear
> regression with multiple independent variables, for example  y=f(x,z)
> can also be solved using the variance-covariance matrix, but I don't
> know how to do this. Can someone help me go from the variance-covariance
> matrix to the solution of a regression with multiple independent
> variables? It is not clear how one applies the matrix solution b=
> (x'x)-1*x'y to the elements of the variance-covariance matrix, i.e. how
> one gets the required values from the variance-covariance matrix.
> Any help, or suggestions would be appreciated.
>

The "x"s you use above have differing meanings - a possible source of 
confusion. The "x" in "(x'x)-1*x'y" is the design matrix and  in the case 
of a simple linear regression (not "bivariate" BTW) contains a column of 
ones and a column of values of the independent variable.

I suggest you review the chapter in Draper and Smith's Applied Regression 
Analysis where the transition to the matrix algebraic formulation of 
regression is laid out. IIRC, it is done first for simple linear 
regression.

In concert with this carry out the computation "longhand" (with the help 
of R) for the simple linear regression using both formulae.

Also do it using a centered version of 'x'.

Here is one version:

> x <- 1:10
> y <- rnorm(10)+x
> cov(x,y)
[1] 10.17249
> var(x)
[1] 9.166667
> X <- cbind(1,x)
> t(X) %*% X
        x
   10  55
x 55 385
> t(X)%*%y
        [,1]
    57.63155
x 408.52594
> cov(x,y)/var(x)
[1] 1.109727
> lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x
     -0.3403       1.1097

> solve( t(X) %*% X ) %*% t(X) %*% y
         [,1]
   -0.3403414
x  1.1097265
> X2 <- cbind( 1, x- mean(x) )
> t(X2) %*% X2
      [,1] [,2]
[1,]   10  0.0
[2,]    0 82.5
> 82.5/9 ### have you seen this before?
[1] 9.166667
> t(X2) %*% y
          [,1]
[1,] 57.63155
[2,] 91.55244
> 91.55244/9 ### or this??
[1] 10.17249
> solve( t(X2) %*% X2 ) %*% t(X2) %*% y
          [,1]
[1,] 5.763155
[2,] 1.109727
> mean(y)
[1] 5.763155
>

Try it again using a centered version of y.

Does this help?

To really get a handle on this, you need to dig into the matrix algebra a 
bit. Rao's Linear Statistical Inference and Its Applications does this 
nicely and shows how matrix operations are carried out on the 
variance-covariance matrices (sorry I don't have the page refs handy, but 
IIRC it is in a later chapter pertaining to multivariate analysis).

Chuck


Comment: "solve( t(X) %*% X ) %*% t(X) %*% y" is NOT the way production 
code for regression problems would be written. If you want to see how 
production code should be written look at the Fortran source for "dqrls" 
in the R source code distribution.



> Thanks,
> John
>
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> Baltimore VA Medical Center GRECC,
> University of Maryland School of Medicine Claude D. Pepper OAIC,
> University of Maryland Clinical Nutrition Research Unit, and
> Baltimore VA Center Stroke of Excellence
>
> University of Maryland School of Medicine
> Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
>
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> jsorkin at grecc.umaryland.edu
>
> Confidentiality Statement:
> This email message, including any attachments, is for the so...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the R-help mailing list