[R] Applying lm() to weighted cell.means

Robert Abugov robert at compete.com
Fri Nov 2 19:17:49 CET 2001

Dear R,

We are seeking a version of summary.lm appropriate for a fit of cell means
(with specified weights and variance).

Imagine one had 5 observations from each of two groups and wished to test a
difference in means.  A simple call to lm() and summary.lm() will return the
appropriate test:

raw.data <- data.frame(x = rep(0:1, rep(5, 2)), y = 1:10)
fit.raw <- lm(y ~ x, data=raw.data)

One should be able to perform an equivalent analysis based on the cell means
(with the previously estimated residual variance and assuming independence
of observations).  However, the vanilla summary.lm() does not allow explicit
specification of estimated residual variance and will assume that we have no
residual degrees of freedom:

cell.means <- data.frame(x = unique(raw.data$x),
	y = c(unlist(tapply(raw.data$y, raw.data$x, mean))))
fit.cell <- lm(y ~ x, data=cell.means)

We would like to generate a similar summary from an analysis of cell means
for a weighted least squares problem where we can specify the cell-wise
variance to accommodate heteroscedasticity in my data (and the number of
observations in each cell).  For example, assume var(y|x==0) = var.y0 and
var(y|x==1) = var.y1.  We could get the correct betas using the 'weights'
option in lm().  To calculate summary statistics, we would like to be able
to make a call such as:

summary.cell(fit.cell, cov=diag(c(var.y0,var.y1)), n=table(fit.raw$x))

As much as possible, we would like to avoid re-writing summary.lm().

Thanks in advance,

Bob and Eric

Robert Abugov
Compete, Inc.
Boston, MA  02116
<bob at compete.com>

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list