[R] svy / weighted regression

Laust laust.mortensen at gmail.com
Fri Oct 9 13:18:33 CEST 2009


Dear list,

I am trying to set up a propensity-weighted regression using the
survey package. Most of my population is sampled with a sampling
probability of one (that is, I have the full population). However, for
a subset of the data I have only a 50% sample of the full population.
In previous work on the data, I analyzed these data using SAS and
STATA. In those packages I used a propensity weight of 1/[sampling
probability] in various generalized linear regression-procedures, but
I am having trouble setting this up. I bet the solution is simple, but
I’m a R newbie. Code to illustrate my problem below.

Thanks
Laust

# loading survey
library(survey)

# creating data
listc <- c("Denmark","Finland","Norway","Sweden","Denmark","Finland","Norway","Sweden")
listw <- c(1,2,1,1,1,1,1,1)
listd <- c(0,0,0,0,1000,1000,1000,2000)
listt <- c(750000,500000,900000,1900000,5000,5000,5000,10000)
list.cwdt <- c(listc, listw, listd, listt)
country <- data.frame(country=listc,weight=listw,deaths=listd,yrs_at_risk=listt)

# running a frequency weighted regression to get the correct point
estimates for comparison
glm <- glm(deaths ~ country + offset(log(yrs_at_risk)),
weights=weight, data=country, family=poisson())
summary(glm)
regTermTest(glm, ~ country)

# running survey weighted regression
svy <- svydesign(~0,,data=country, weight=~weight)
svyglm <- svyglm(deaths ~ country + offset(log(yrs_at_risk)),
design=svy, data=country, family=poisson())
summary(svyglm)
# point estimates are correct, but standard error is way too large
regTermTest(svyglm, ~ country)
# test indicates no country differences




More information about the R-help mailing list