[R] survey package question

Thu Oct 11 21:33:27 CEST 2012

On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
<sebastian.daza at gmail.com> wrote:
> Hello,
>
> I have got a cluster sample using an election dataset where I already
> had the final results of a county-specific election. I am trying to
> figure out what would be the best sampling design for my data.
>
> The  structure of the dataset is:
>
> 1) polling station (in general schools where people vote, for a
> county, for example, there are 15 polling stations)
> 2) inside each polling station, there are voting units, where people
> actually vote (on average there are about 40 voting units for polling
> station)
> 3) for each voting unit I have the total votes by candidate (e.g.,
> candidate 1 =322, candidate 2=122, candidate 3= 89)
>
> The initial sampling design is:
> 1) selection of 5 polling stations PPS (based on number of voters)
> 2) selection of 10 voting units (SRS)
>
> I am interested in estimating the proportion of votes by candidate
> (let's assume we have 3 candidates). My naive estimate would be:
>
> votes for candidate 1 / all valid votes = proportion
>
> e.g.
>
> candidate 1= 2132 / 10874= .1906
> candidate 2= 5323 / 10874= .4895
> candidate 3= 3419 / 10874= .3144
>
> In this case, the unit of analysis is voters (or votes).
>
>  If I specify the sampling design using the survey package in this way...
>
> design <-svydesign(id=~station + unit  fpc=~probstation +probunit,
> data=sample, pps="brewer")
>
> svyciprop(~I(candidate1/totalVotes), design)
>
> ... I am assuming that the unit of analysis is the voting unit, right?
> and I am estimating an average among voting units?
>

You want a ratio estimator

svyratio(~candidate1, ~totalVotes, design)


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland