[R] survey package question

Thu Oct 11 19:56:53 CEST 2012

Hello,

I have got a cluster sample using an election dataset where I already
had the final results of a county-specific election. I am trying to
figure out what would be the best sampling design for my data.

The  structure of the dataset is:

1) polling station (in general schools where people vote, for a
county, for example, there are 15 polling stations)
2) inside each polling station, there are voting units, where people
actually vote (on average there are about 40 voting units for polling
station)
3) for each voting unit I have the total votes by candidate (e.g.,
candidate 1 =322, candidate 2=122, candidate 3= 89)

The initial sampling design is:
1) selection of 5 polling stations PPS (based on number of voters)
2) selection of 10 voting units (SRS)

I am interested in estimating the proportion of votes by candidate
(let's assume we have 3 candidates). My naive estimate would be:

votes for candidate 1 / all valid votes = proportion

e.g.

candidate 1= 2132 / 10874= .1906
candidate 2= 5323 / 10874= .4895
candidate 3= 3419 / 10874= .3144

In this case, the unit of analysis is voters (or votes).

 If I specify the sampling design using the survey package in this way...

design <-svydesign(id=~station + unit  fpc=~probstation +probunit,
data=sample, pps="brewer")

svyciprop(~I(candidate1/totalVotes), design)

... I am assuming that the unit of analysis is the voting unit, right?
and I am estimating an average among voting units?

I should expand my database at individual level (voters), or I just
have to include a unit weight according to the number of voters for
voting unit? In other words, is there a way to estimate, for instance,
"votes for candidate 1 / all valid votes = proportion", directly from
the survey package or I have to expand  the database at people level
(voters), and then estimate the proportion using svymean and the
respective design.

I would appreciate any advice or help.

Sebastian