[R] confidence intervals for differences in proportions from complex survey design?
Brown, Tony Nicholas
tony.n.brown at Vanderbilt.Edu
Mon May 11 06:40:19 CEST 2015
All:
I need to generate confidence intervals for differences in proportions using data from a complex survey design. An example follows where I attempt to estimate the difference in depression prevalence by sex.
# Data might look something like this:
Dfr<-data.frame(depression=sample(c("yes","no"), size=30, replace=TRUE),
sex=sample(c("M","F"), size=30, replace=TRUE),
cluster=rep(1:10, times=3),
stratum=rep(1:5, each=2, times=3),
pweight=runif(n=30, min=1, max=3))
Dfr
library(survey)
msdesign<-svydesign(id=~cluster, strata=~stratum, weights=~pweight, nest=TRUE,
data=Dfr)
# When searching online, one recommendation was to use svyglm() to generate an
# approximation as follows:
confint(with(Dfr, svyglm(I(depression=="yes")~sex, family=gaussian(link=identity),
msdesign)), level=0.95, method="Wald")
This question has been asked before on the listserv (circa 2007) and I contacted the original poster, who indicated that they never received a reply.
Here is the question as described by the original poster:
"I'm trying to get confidence intervals of proportions (sometimes for
subgroups) estimated from complex survey data. Because a function like
prop.test() does not exist for the "survey" package I tried the following:
1) Define a survey object (PSU of clustered sample, population weights);
2) Use svyglm() of the package "survey" to estimate a binary logistic
regression (family='binomial'): For the confidence interval of a single
proportion regress the binary dependent variable on a constant (1), for
confidence intervals of that variable for subgroups regress this
variable on the groups (factor) variable;
3) Use predict() to obtain estimated logits and the respective standard
errors (mod.dat specifying either the constant or the subgroups):
pred=predict(model,mod.dat,type='link',se.fit=T)
and apply the following to obtain the proportion with its confidence
intervals (for example, for conf.level=.95):
lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
prop = 1/(1+exp(-pred[1:length(pred)]))
lo = 1/(1+exp(-lo.e))
hi = 1/(1+exp(-hi.e))
I think that in that way I get CI's based on asymptotic normality -
either for a single proportion or split up into subgroups.
Question: Is this a correct or a defensible procedure? Or should I use a
different approach? Note that this approach should also allow to
estimate CI's for proportions of subgroups taking into account the
complex survey design."
Thanks in advance for any help that you can provide.
Tony
------------------------------------------------------------------------------
Tony N. Brown, Ph.D.
Associate Chair and Associate Professor of Sociology
Google Scholar Profile: http://tinyurl.com/lozlht8
LinkedIn Profile: https://www.linkedin.com/pub/tony-nicholas-brown/a6/64/31a
More information about the R-help
mailing list