[R] Weighted least squares

hadley wickham h.wickham at gmail.com
Tue May 8 11:08:34 CEST 2007


Dear all,

I'm struggling with weighted least squares, where something that I had
assumed to be true appears not to be the case.  Take the following
data set as an example:

df <- data.frame(x = runif(100, 0, 100))
df$y <- df$x + 1 + rnorm(100, sd=15)

I had expected that:

summary(lm(y ~ x, data=df, weights=rep(2, 100)))
summary(lm(y ~ x, data=rbind(df,df)))

would be equivalent, but they are not.  I suspect the difference is
how the degrees of freedom is calculated - I had expected it to be
sum(weights), but seems to be sum(weights > 0).  This seems
unintuitive to me:

summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))

What am I missing?  And what is the usual way to do a linear
regression when you have aggregated data?

Thanks,

Hadley



More information about the R-help mailing list