[R] confidence intervals for differences in proportions from complex survey design?

Mon May 11 07:52:25 CEST 2015

i don't know the answer to your larger question, but for confidence
intervals around proportions you might look at ?svyciprop.  one of the
method= options might yield the same result as your approximation, not sure

On Mon, May 11, 2015 at 12:40 AM, Brown, Tony Nicholas <
tony.n.brown at vanderbilt.edu> wrote:

> All:
>
> I need to generate confidence intervals for differences in proportions
> using data from a complex survey design. An example follows where I attempt
> to estimate the difference in depression prevalence by sex.
>
> # Data might look something like this:
> Dfr<-data.frame(depression=sample(c("yes","no"), size=30, replace=TRUE),
>         sex=sample(c("M","F"), size=30, replace=TRUE),
>         cluster=rep(1:10, times=3),
>         stratum=rep(1:5, each=2, times=3),
>         pweight=runif(n=30, min=1, max=3))
> Dfr
> library(survey)
> msdesign<-svydesign(id=~cluster, strata=~stratum, weights=~pweight,
> nest=TRUE,
>         data=Dfr)
> # When searching online, one recommendation was to use svyglm() to
> generate an
> # approximation as follows:
> confint(with(Dfr, svyglm(I(depression=="yes")~sex,
> family=gaussian(link=identity),
>         msdesign)), level=0.95, method="Wald")
>
> This question has been asked before on the listserv (circa 2007) and I
> contacted the original poster, who indicated that they never received a
> reply.
>
> Here is the question as described by the original poster:
>
> "I'm trying to get confidence intervals of proportions (sometimes for
> subgroups) estimated from complex survey data. Because a function like
> prop.test() does not exist for the "survey" package I tried the following:
>
> 1) Define a survey object (PSU of clustered sample, population weights);
> 2) Use svyglm() of the package "survey" to estimate a binary logistic
> regression (family='binomial'): For the confidence interval of a single
> proportion regress the binary dependent variable on a constant (1), for
> confidence intervals of that variable for subgroups regress this
> variable on the groups (factor) variable;
> 3) Use predict() to obtain estimated logits and the respective standard
> errors (mod.dat specifying either the constant or the subgroups):
>
>     pred=predict(model,mod.dat,type='link',se.fit=T)
>
> and apply the following to obtain the proportion with its confidence
> intervals (for example, for conf.level=.95):
>
>     lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
>     hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
>     prop = 1/(1+exp(-pred[1:length(pred)]))
>     lo = 1/(1+exp(-lo.e))
>     hi = 1/(1+exp(-hi.e))
>
> I think that in that way I get CI's based on asymptotic normality -
> either for a single proportion or split up into subgroups.
>
> Question: Is this a correct or a defensible procedure? Or should I use a
> different approach? Note that this approach should also allow to
> estimate CI's for proportions of subgroups taking into account the
> complex survey design."
>
> Thanks in advance for any help that you can provide.
>
> Tony
>
>
> ------------------------------------------------------------------------------
> Tony N. Brown, Ph.D.
> Associate Chair and Associate Professor of Sociology
> Google Scholar Profile: http://tinyurl.com/lozlht8
> LinkedIn Profile:
> https://www.linkedin.com/pub/tony-nicholas-brown/a6/64/31a
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]