[R] na.action and simultaneous regressions
Bert Gunter
gunter.berton at gene.com
Wed Jan 3 23:19:39 CET 2007
Ravi:
You misinterpreted my reply -- perhaps I was unclear. I did **not** say that
lm() with a matrix response would do it, but that the apply construction or
an explicit loop would. As you and the poster noted, lm() produces a
separate fit to each column of only the rowwise complete data.
Bert Gunter
-----Original Message-----
From: Ravi Varadhan [mailto:rvaradhan at jhmi.edu]
Sent: Wednesday, January 03, 2007 2:15 PM
To: 'Bert Gunter'; 'Talbot Katz'; r-help at stat.math.ethz.ch
Subject: RE: [R] na.action and simultaneous regressions
No, Bert, lm doesn't produce a list each of whose components is a separate
fit using "all" the nonmissing data in the column. It is true that the
regressions are independently performed, but when the response matrix is
passed from "lm" on to "lm.fit", only the complete rows are passed, i.e.
rows with no missing values. I looked at "lm" function, but it was not
obvious to me how to fix it.
In the following toy example, the degrees of freedom for y1 regression
should be 18 and that for y2 should be 15, but both degrees of freedom are
only 15.
> y1 <- runif(20)
> y2 <- c(runif(17), rep(NA,3))
> x <- rnorm(20)
> summary(lm(cbind(y1,y2) ~ x))
Response y1 :
Call:
lm(formula = y1 ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.52592 -0.22632 -0.00964 0.25117 0.31227
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.56989 0.06902 8.257 5.82e-07 ***
x -0.12325 0.06516 -1.891 0.078 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2798 on 15 degrees of freedom
Multiple R-Squared: 0.1926, Adjusted R-squared: 0.1387
F-statistic: 3.577 on 1 and 15 DF, p-value: 0.07804
Response y2 :
Call:
lm(formula = y2 ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.48880 -0.28552 -0.06022 0.23167 0.54425
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.43712 0.07686 5.687 4.31e-05 ***
x 0.10278 0.07257 1.416 0.177
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3115 on 15 degrees of freedom
Multiple R-Squared: 0.118, Adjusted R-squared: 0.05915
F-statistic: 2.006 on 1 and 15 DF, p-value: 0.1771
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Bert Gunter
Sent: Wednesday, January 03, 2007 4:46 PM
To: 'Talbot Katz'; r-help at stat.math.ethz.ch
Subject: Re: [R] na.action and simultaneous regressions
As the Help page says:
If response is a matrix a linear model is fitted separately by least-squares
to each column of the matrix
So there's nothing hidden going on "behind the scenes," and
apply(cbind(y1,y2),2,function(z)lm(z~x)) (or an explicit loop, of course)
will produce a list each of whose components is a separate fit using all the
nonmissing data in the column.
Bert Gunter
Genentech Nonclinical Statistics
South San Francisco, CA 94404
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Talbot Katz
Sent: Wednesday, January 03, 2007 11:56 AM
To: r-help at stat.math.ethz.ch
Subject: [R] na.action and simultaneous regressions
Hi.
I am running regressions of several dependent variables using the same set
of independent variables. The independent variable values are complete, but
each dependent variable has some missing values for some observations; by
default, lm(y1~x) will carry out the regressions using only the observations
without missing values of y1. If I do lm(cbind(y1,y2)~x), the default will
be to use only the observations for which neither y1 nor y2 is missing. I'd
like to have the regression for each separate dependent variable use all the
non-missing cases for that variable. I would think that there should be a
way to do that using the na.action option, but I haven't seen this in the
documentation or figured out how to do it on my own. Can it be done this
way, or do I have to code the regressions in a loop? (By the way, since it
restricts to non-missing values in all the variables simultaneously, is this
because it's doing some sort of SUR or other simultaneous equation
estimation behind the scenes?)
Thanks!
-- TMK --
212-460-5430 home
917-656-5351 cell
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list