> I have got a cluster sample using an election dataset where I already
> had the final results of a county-specific election. I am trying to
> figure out what would be the best sampling design for my data.
> The  structure of the dataset is:
> 1) polling station (in general schools where people vote, for a
> county, for example, there are 15 polling stations)
> 2) inside each polling station, there are voting units, where people
> actually vote (on average there are about 40 voting units for polling
> station)
> 3) for each voting unit I have the total votes by candidate (e.g.,
> candidate 1 =322, candidate 2=122, candidate 3= 89)
> The initial sampling design is:
> 1) selection of 5 polling stations PPS (based on number of voters)
> 2) selection of 10 voting units (SRS)
> I am interested in estimating the proportion of votes by candidate
> (let's assume we have 3 candidates). My naive estimate would be:
>
> e.g.
>
> candidate 1= 2132 / 10874= .1906
> candidate 2= 5323 / 10874= .4895
> candidate 3= 3419 / 10874= .3144
> In this case, the unit of analysis is voters (or votes).
>
>  If I specify the sampling design using the survey package in this way...
> design <-svydesign(id=~station + unit  fpc=~probstation +probunit,
> data=sample, pps="brewer")
> ... I am assuming that the unit of analysis is the voting unit, right?
> and I am estimating an average among voting units?
You want a ratio estimator

-thomas

