[R] large survey data set

Thomas Lumley tlumley at u.washington.edu
Fri Jun 28 18:23:49 CEST 2002


On Fri, 28 Jun 2002, Andrew Perrin wrote:

> This is interesting and a bit disturbing. I've been using the weights=
> syntax to assign a case-weighting system in a survey dataset as well. Can
> you send me somewhere for documentation of the differences?

There's some discussion in
   http://www.niesr.ac.uk/niesr/wers98/Purdpap4.pdf

I don't know of a reference book that describes both -- they tend to be
done by non-overlapping groups of people -- but the two sets of formulas
should be easy to find.

For linear regression the estimation in  both cases is done by multiplying
the X and Y matrices by the square root of the weights and then doing
ordinary least squares.  The difference is that with variance weights this
transformed least squares fit will have constant variance residuals but
with probability weights it typically won't, so the usual standard errors
are wrong.

For linear regression there will only be serious problems if you have a
variable that strongly predicts the outcome and the weights and has a
skewed distribution.  For logistic or Poisson regression, where there
isn't a free dispersion parameter available the problems can be worse.

	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list