[R] Using "survey" package with ACS PUMS

Michael.Laviolette at dhhs.state.nh.us Michael.Laviolette at dhhs.state.nh.us
Tue Sep 30 15:17:28 CEST 2014


I'm trying to reproduce some results from the American Community Survey
PUMS data using the "survey" package. I'm using the one-year 2012 estimates
for New Hampshire
(http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to the
estimates for user verification from
http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv

Once the age groups are set up as specified in the verification estimates,
the following SAS code produces the correct estimated totals with standard
errors:

proc surveyfreq data = acs2012 varmethod = jackknife;
  weight pwgtp;
  repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
  table SEX agegroup;
run;

I've not been successful in reproducing the standard errors with R,
although they are very close. My code follows; what revisions do I need to
make?

Thanks,
Mike L.

# load estimates for verification
pums_est <- read.csv("pums_estimates_12.csv")
pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))

# load PUMS data
pums_p <- read.csv("ss12pnh.csv")
# convert sex and age group to factors
pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
pums_p$agegrp <- cut(pums_p$AGEP,
                     c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
                     right = FALSE)

# create replicate-weight survey object
library(survey)
pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
                          weights = pums_p[,7],
                          combined.weights = TRUE,
                          type = "Fay", rho = 1 - 1/sqrt(4),
                          scale = 1, rscales = 1,
                          data = pums_p)

# using type "JK1" with scale = 4/80 and rscales = rep(1,80)
#   seems to produce the same results

# total population by sex with SE's
by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
round(by.sex[1,4:5])
#     se1  se2
# 33 1606 1606
# compare results with Census
pums_est[966:967, 5]
#[1] 1610 1610

# total population by age group with SE's
by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
round((by.agegrp)[15:27])
#       se1  se2  se3  se4  se5  se6  se7  se8  se9 se10 se11 se12 se13
#    33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200  880 1700 1678
# compare results with Census
pums_est[968:980, 5]
#  [1]  874 2578 2613 1463 1399 1476 1493 1555 2191 2200  880 1702 1684



More information about the R-help mailing list