[R] confidence intervals of proportions from complex surveys

Dirk Enzmann dirk.enzmann at uni-hamburg.de
Thu Sep 6 00:55:53 CEST 2007


This is partly an R and partly a general statistics question.

I'm trying to get confidence intervals of proportions (sometimes for 
subgroups) estimated from complex survey data. Because a function like 
prop.test() does not exist for the "survey" package I tried the following:

1) Define a survey object (PSU of clustered sample, population weights);
2) Use svyglm() of the package "survey" to estimate a binary logistic 
regression (family='binomial'): For the confidence interval of a single 
proportion regress the binary dependent variable on a constant (1), for 
confidence intervals of that variable for subgroups regress this 
variable on the groups (factor) variable;
3) Use predict() to obtain estimated logits and the respective standard 
errors (mod.dat specifiying either the constant or the subgroups):

    pred=predict(model,mod.dat,type='link',se.fit=T)

and apply the following to obtain the proportion with its confidence 
intervals (for example, for conf.level=.95):

    lo.e = pred[1:length(pred)]-qnorm((1+conf.level)/2)*SE(pred)
    hi.e = pred[1:length(pred)]+qnorm((1+conf.level)/2)*SE(pred)
    prop = 1/(1+exp(-pred[1:length(pred)]))
    lo = 1/(1+exp(-lo.e))
    hi = 1/(1+exp(-hi.e))

I think that in that way I get CI's based on asymptotic normality - 
either for a single proportion or split up into subgroups.

Question: Is this a correct or a defensible procedure? Or should I use a 
different approach? Note that this approach should also allow to 
estimate CI's for proportions of subgroups taking into account the 
complex survey design.

TIA,
Dirk

********************************
R version 2.5.1 Patched (2007-08-10 r42469)
i386-pc-mingw32



More information about the R-help mailing list