[R] two methods for regression, two different results

Bill.Venables@csiro.au Bill.Venables at csiro.au
Wed Apr 6 05:25:58 CEST 2005


This is possible if x and z are orthogonal, but in general it doesn't
work as you have noted.  (If it did work it would almost amount to a way
of inverting geenral square matrices by working one row at a time, no
going back...)

It is possible to fit a bivariate regression using simple linear
regression techniques iteratively like this, but it is a bit more
involved than your two step process.

1. regress y on x and take the residuals: ryx <- resid(lm(y ~ x))

2. regress z on x and take the residuals: rzx <- resid(lm(z ~ x))

3. regress ryx on rzx: fitz <- lm(ryx ~ rzx)

4. this gives you the estimate of the coefficient on z (what you call
below b2):
b2 <- coef(fitz)[2]

5. regress y - b2*z on x: fitx <- lm(I(y - b2*z) ~ x)

This last step gets you the estimates of b0 and b1.

None of this works with significances, though, because in going about it
this way you have essentially disguised the degrees of freedom involved.
So you can get the right estimates, but the standard errors,
t-statistics and residual variances are all somewhat inaccurate (though
usually not by much).

If x and z are orthogonal the (curious looking) step 2 is not needed.

This kind of idea lies behind some algorithms (e.g. Stevens' algorithm)
for fitting very large regressions essentially by iterative processes to
avoid constructing a huge model matrix.

Bill Venables

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of John Sorkin
Sent: Wednesday, 6 April 2005 12:55 PM
To: r-help at stat.math.ethz.ch
Subject: [R] two methods for regression, two different results


Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1<-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2<-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2<-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
-- NOTE NEW EMAIL ADDRESS:
jsorkin at grecc.umaryland.edu

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list