[R] large survey data set

Andrew Perrin clists at perrin.socsci.unc.edu
Fri Jun 28 17:42:51 CEST 2002

This is interesting and a bit disturbing. I've been using the weights=
syntax to assign a case-weighting system in a survey dataset as well. Can
you send me somewhere for documentation of the differences?


Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists at perrin.socsci.unc.edu * andrew_perrin (at) unc.edu

On Thu, 27 Jun 2002, Thomas Lumley wrote:

> On Thu, 27 Jun 2002, Andrew Perrin wrote:
> > The lm function (for linear modelling aka linear regression) includes
> > case weights with a simple syntax:
> >
> > foo<-lm(dependent ~ indep + indep + ... ,
> > 	data = <data object>,
> > 	weights = <weight variable>)
> Yes, but that isn't what he means by weights...
> The standard regression weights are variance weights: a weight of 2
> denotes an observation with half the variance of a weight of 1.
> In survey sampling (and in related missing data and causal inference
> models) you need probability weights: a weight of 2 means an observation
> had half the chance of being sampled.  You get the same regression
> coefficients (more or less) but quite different standard errors.
> The `model-robust' sandwich variance estimators give about the right
> standard errors (as long as the sampling fraction is small). These are
> built in to the survival models, but not in most other software. They are
> pretty easy to calculate but with a 20% sample they probably aren't going
> to work well.
> 	-thomas

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list