[R] survey package question

Sebastián Daza sebastian.daza at gmail.com
Fri Oct 12 02:51:26 CEST 2012


Hello Thomas,

I use both svymean (with the expanded sample = people), and svyratio
(voting unit level), using the same design:

design <-svydesign(id=~station + unit, fpc=~probstation+probunits,
data=sample, pps="brewer")

I got different results using the same sample:

svyratio (voting unit)

                Ratio       2.5%      97.5%     Result
Cand1      0.05252871 0.04537301 0.05968441 0.05181146
Cand2    0.47226973 0.45215097 0.49238849 0.49041590
Cand3   0.47520156 0.45460831 0.49579482 0.45777264

svymean (expanded sample, individuals or votes)

               Mean          SE      2.5 %     97.5 %    Results
Cand1   0.0528433 0.004562755 0.04390047 0.06178614 0.05181146
Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590
Cand3    0.4754063 0.010429222 0.45496538 0.49584718 0.45777264

Point estimators are different, and confidence intervals are more
narrow using svyratio.
Could you give me any clue about what is going on?

Thank you in advance.
Sebastian

On Thu, Oct 11, 2012 at 7:50 PM, Sebastián Daza
<sebastian.daza at gmail.com> wrote:
> Hello Thomas,
>
> I use both svymean (with the expanded sample = people), and svyratio
> (voting unit level), using the same design:
>
> design <-svydesign(id=~station + unit, fpc=~probstation+probunits,
> data=sample, pps="brewer")
>
> I got different results using the same sample:
>
> svyratio (voting unit)
>
>                 Ratio       2.5%      97.5%     Result
> Cand1      0.05252871 0.04537301 0.05968441 0.05181146
> Cand2    0.47226973 0.45215097 0.49238849 0.49041590
> Cand3   0.47520156 0.45460831 0.49579482 0.45777264
>
> svymean (expanded sample, individuals or votes)
>
>                Mean          SE      2.5 %     97.5 %    Results
> Cand1   0.0528433 0.004562755 0.04390047 0.06178614 0.05181146
> Cand2 0.4717504 0.010201398 0.45175605 0.49174480 0.49041590
> Cand3    0.4754063 0.010429222 0.45496538 0.49584718 0.45777264
>
> Point estimators are different, and confidence intervals are more
> narrow using svyratio.
> Could you give me any clue about what is going on?
>
> Thank you in advance.
> Sebastian
>
> On Thu, Oct 11, 2012 at 3:56 PM, Sebastián Daza
> <sebastian.daza at gmail.com> wrote:
>> Thank you Thomas!
>>
>> On Thu, Oct 11, 2012 at 2:33 PM, Thomas Lumley <tlumley at uw.edu> wrote:
>>> On Fri, Oct 12, 2012 at 6:56 AM, Sebastián Daza
>>> <sebastian.daza at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I have got a cluster sample using an election dataset where I already
>>>> had the final results of a county-specific election. I am trying to
>>>> figure out what would be the best sampling design for my data.
>>>>
>>>> The  structure of the dataset is:
>>>>
>>>> 1) polling station (in general schools where people vote, for a
>>>> county, for example, there are 15 polling stations)
>>>> 2) inside each polling station, there are voting units, where people
>>>> actually vote (on average there are about 40 voting units for polling
>>>> station)
>>>> 3) for each voting unit I have the total votes by candidate (e.g.,
>>>> candidate 1 =322, candidate 2=122, candidate 3= 89)
>>>>
>>>> The initial sampling design is:
>>>> 1) selection of 5 polling stations PPS (based on number of voters)
>>>> 2) selection of 10 voting units (SRS)
>>>>
>>>> I am interested in estimating the proportion of votes by candidate
>>>> (let's assume we have 3 candidates). My naive estimate would be:
>>>>
>>>> votes for candidate 1 / all valid votes = proportion
>>>>
>>>> e.g.
>>>>
>>>> candidate 1= 2132 / 10874= .1906
>>>> candidate 2= 5323 / 10874= .4895
>>>> candidate 3= 3419 / 10874= .3144
>>>>
>>>> In this case, the unit of analysis is voters (or votes).
>>>>
>>>>  If I specify the sampling design using the survey package in this way...
>>>>
>>>> design <-svydesign(id=~station + unit  fpc=~probstation +probunit,
>>>> data=sample, pps="brewer")
>>>>
>>>> svyciprop(~I(candidate1/totalVotes), design)
>>>>
>>>> ... I am assuming that the unit of analysis is the voting unit, right?
>>>> and I am estimating an average among voting units?
>>>>
>>>
>>> You want a ratio estimator
>>>
>>> svyratio(~candidate1, ~totalVotes, design)
>>>
>>>
>>>    -thomas
>>>
>>> --
>>> Thomas Lumley
>>> Professor of Biostatistics
>>> University of Auckland
>>
>>
>>
>> --
>> Sebastián Daza
>
>
>
> --
> Sebastián Daza



-- 
Sebastián Daza




More information about the R-help mailing list