[R] weighting (survey) data
    Thomas Lumley 
    tlumley at u.washington.edu
       
    Mon Jan 18 00:24:19 CET 2010
    
    
  
On Sun, 17 Jan 2010, Vera wrote:
> 2010/1/16 Thomas Lumley <tlumley at u.washington.edu>:
>> On Sat, 16 Jan 2010, Vera wrote:
>>
>>> Thanks for your help so far, everyone.
>>>
>>> Thomas: I haven't looked very deep into the survey package yet, so I
>>> don't know if what I'm looking for is actually missing or if I just
>>> haven't found it yet.
>>> What is "missing", from my point of view at the moment, is some kind
>>> of global weighting function that allows me to set a weight and then
>>> just perform different kinds of analyses without thinking about it any
>>> more.
>>
>> There isn't anything like this, because it isn't possible. Some analyses
>> can't sensibly be done with sampling weights; for others you can get point
>> estimates but it is hard to get standard errors.
>
> I see, having read some papers about weighting now. What I described
> is possible in other statistics software, so I was kind of mislead
> into thinking it couldn't be that complicated. (I guess all analyses,
> where appropriate, are done with weighted data; if you don't specify a
> weight variable, all weights are 1 by default).
>
In fact, what you described is not possible in other statistical software, because it just is not possible.  Stata comes closest, but even there not everything can be done with sampling weights.
The WEIGHT BY instruction in SPSS gives you frequency weights, not sampling weights. A frequency weight of, eg, 10 means that your data set contains 10 copies of the observation and you are storing them in a single record to save space.  These are easy to implement, but they usually give the wrong p-values and confidence intervals, and sometimes give the wrong point estimates, if you really have sampling weights.
There's a nice description of what is available in some commercial packages at http://www.ats.ucla.edu/stat/SPSS/faq/weights.htm
       -thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
    
    
More information about the R-help
mailing list