# [R] Weighted least squares

John Fox jfox at mcmaster.ca
Wed May 9 13:16:37 CEST 2007

```Dear Hadley,

> -----Original Message-----
> From: hadley wickham [mailto:h.wickham at gmail.com]
> Sent: Wednesday, May 09, 2007 2:21 AM
> To: John Fox
> Cc: R-help at stat.math.ethz.ch
> Subject: Re: [R] Weighted least squares
>
> Thanks John,
>
> That's just the explanation I was looking for. I had hoped
> that there would be a built in way of dealing with them with
> R, but obviously not.
>
> Given that explanation, it stills seems to me that the way R
> calculates n is suboptimal, as demonstrated by my second example:
>
> summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
>
> the weights are only very slightly different but the
> estimates of residual standard error are quite different (20
> vs 14 in my run)
>

Observations with 0 weight are literally excluded, while those with very
small weight (relative to others) don't contribute much to the fit.
Consequently you get very similar coefficients but different numbers of
observations.

I hope this helps,
John

>
> On 5/8/07, John Fox <jfox at mcmaster.ca> wrote:
> >
> > I think that the problem is that the term "weights" has different
> > meanings, which, although they are related, are not quite the same.
> >
> > The weights used by lm() are (inverse-)"variance weights,"
> reflecting
> > the variances of the errors, with observations that have
> low-variance
> > errors therefore being accorded greater weight in the
> resulting WLS regression.
> > What you have are sometimes called "case weights," and I'm
> unaware of
> > a general way of handling them in R, although you could
> regenerate the
> > unaggregated data. As you discovered, you get the same coefficients
> > with case weights as with variance weights, but different
> standard errors.
> > Finally, there are "sampling weights," which are inversely
> > proportional to the probability of selection; these are
> accommodated by the survey package.
> >
> > To complicate matters, this terminology isn't entirely standard.
> >
> > I hope this helps,
> >  John
> >
> > --------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> > > -----Original Message-----
> > > From: r-help-bounces at stat.math.ethz.ch
> > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of hadley
> > > wickham
> > > Sent: Tuesday, May 08, 2007 5:09 AM
> > > To: R Help
> > > Subject: [R] Weighted least squares
> > >
> > > Dear all,
> > >
> > > I'm struggling with weighted least squares, where
> something that I
> > > had assumed to be true appears not to be the case.
> > > Take the following data set as an example:
> > >
> > > df <- data.frame(x = runif(100, 0, 100)) df\$y <- df\$x + 1 +
> > > rnorm(100, sd=15)
> > >
> > > I had expected that:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(2, 100)))
> summary(lm(y ~ x,
> > > data=rbind(df,df)))
> > >
> > > would be equivalent, but they are not.  I suspect the
> difference is
> > > how the degrees of freedom is calculated - I had expected
> it to be
> > > sum(weights), but seems to be sum(weights > 0).  This seems
> > > unintuitive to me:
> > >
> > > summary(lm(y ~ x, data=df, weights=rep(c(0,2), each=50)))
> > > summary(lm(y ~ x, data=df, weights=rep(c(0.01,2), each=50)))
> > >
> > > What am I missing?  And what is the usual way to do a linear
> > > regression when you have aggregated data?
> > >
> > > Thanks,
> > >
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help