[R] Using "survey" package with ACS PUMS

Michael.Laviolette at dhhs.state.nh.us Michael.Laviolette at dhhs.state.nh.us
Tue Sep 30 15:51:11 CEST 2014


Setting the options did the trick. The code using "Fay" came from another
post, but this works:

options( "survey.replicates.mse" = TRUE)
pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
                          weights = ~PWGTP,
                          combined.weights = TRUE,
                          type = "JK1",
                          scale = 4/80, rscales = rep(1, 80),
                          data = pums_p)

Thanks!
Mike L.


hi michael, you probably need

options( "survey.replicates.mse" = TRUE )


i also think you don't want type = "Fay" and you do want scale = 4/80 and
rscales = rep( 1 , 80 )  as well, but what you have might be equivalent
(not sure)


regardless, this blog post details how to precisely replicate the census
bureau's estimates using the acs pums with R

http://www.asdfree.com/search/label/american%20community%20survey%20%28acs%29




On Tue, Sep 30, 2014 at 9:17 AM, <Michael.Laviolette at dhhs.state.nh.us>
wrote:

  I'm trying to reproduce some results from the American Community Survey
  PUMS data using the "survey" package. I'm using the one-year 2012
  estimates
  for New Hampshire
  (http://www2.census.gov/acs2012_1yr/pums/csv_pnh.zip) and comparing to
  the
  estimates for user verification from
  http://www.census.gov/acs/www/Downloads/data_documentation/pums/Estimates/pums_estimates_12.csv


  Once the age groups are set up as specified in the verification
  estimates,
  the following SAS code produces the correct estimated totals with
  standard
  errors:

  proc surveyfreq data = acs2012 varmethod = jackknife;
    weight pwgtp;
    repweights pwgtp1 -- pwgtp80 / jkcoefs = 0.05;
    table SEX agegroup;
  run;

  I've not been successful in reproducing the standard errors with R,
  although they are very close. My code follows; what revisions do I need
  to
  make?

  Thanks,
  Mike L.

  # load estimates for verification
  pums_est <- read.csv("pums_estimates_12.csv")
  pums_est[,4] <- as.integer(gsub(",", "", pums_est[,4]))

  # load PUMS data
  pums_p <- read.csv("ss12pnh.csv")
  # convert sex and age group to factors
  pums_p$SEX <- factor(pums_p$SEX, labels = c("M","F"))
  pums_p$agegrp <- cut(pums_p$AGEP,
                       c(0,5,10,15,20,25,35,45,55,60,65,75,85,100),
                       right = FALSE)

  # create replicate-weight survey object
  library(survey)
  pums_p.rep <- svrepdesign(repweights = pums_p[207:286],
                            weights = pums_p[,7],
                            combined.weights = TRUE,
                            type = "Fay", rho = 1 - 1/sqrt(4),
                            scale = 1, rscales = 1,
                            data = pums_p)

  # using type "JK1" with scale = 4/80 and rscales = rep(1,80)
  #   seems to produce the same results

  # total population by sex with SE's
  by.sex <- svyby(~SEX, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
  round(by.sex[1,4:5])
  #     se1  se2
  # 33 1606 1606
  # compare results with Census
  pums_est[966:967, 5]
  #[1] 1610 1610

  # total population by age group with SE's
  by.agegrp <- svyby(~agegrp, ~ST, pums_p.rep, svytotal, na.rm = TRUE)
  round((by.agegrp)[15:27])
  #       se1  se2  se3  se4  se5  se6  se7  se8  se9 se10 se11 se12 se13
  #    33 874 2571 2613 1463 1398 1475 1492 1552 2191 2200  880 1700 1678
  # compare results with Census
  pums_est[968:980, 5]
  #  [1]  874 2578 2613 1463 1399 1476 1493 1555 2191 2200  880 1702 1684





More information about the R-help mailing list