[R] survey package question
tlumley at uw.edu
Thu Oct 11 21:33:27 CEST 2012
On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
<sebastian.daza at gmail.com> wrote:
> I have got a cluster sample using an election dataset where I already
> had the final results of a county-specific election. I am trying to
> figure out what would be the best sampling design for my data.
> The structure of the dataset is:
> 1) polling station (in general schools where people vote, for a
> county, for example, there are 15 polling stations)
> 2) inside each polling station, there are voting units, where people
> actually vote (on average there are about 40 voting units for polling
> 3) for each voting unit I have the total votes by candidate (e.g.,
> candidate 1 =322, candidate 2=122, candidate 3= 89)
> The initial sampling design is:
> 1) selection of 5 polling stations PPS (based on number of voters)
> 2) selection of 10 voting units (SRS)
> I am interested in estimating the proportion of votes by candidate
> (let's assume we have 3 candidates). My naive estimate would be:
> votes for candidate 1 / all valid votes = proportion
> candidate 1= 2132 / 10874= .1906
> candidate 2= 5323 / 10874= .4895
> candidate 3= 3419 / 10874= .3144
> In this case, the unit of analysis is voters (or votes).
> If I specify the sampling design using the survey package in this way...
> design <-svydesign(id=~station + unit fpc=~probstation +probunit,
> data=sample, pps="brewer")
> svyciprop(~I(candidate1/totalVotes), design)
> ... I am assuming that the unit of analysis is the voting unit, right?
> and I am estimating an average among voting units?
You want a ratio estimator
svyratio(~candidate1, ~totalVotes, design)
Professor of Biostatistics
University of Auckland
More information about the R-help