[R] Direct Method Age-Adjustment to Complex Survey Data

Thomas Lumley tlumley at uw.edu
Mon Aug 13 06:27:07 CEST 2012


On Sat, Aug 11, 2012 at 5:53 AM, Anthony Damico <ajdamico at gmail.com> wrote:
> Hi everyone, my apologies in advance if I'm overlooking something simple in
> this question.  I am trying to use R's survey package to make a direct
> method age-adjustment to some complex survey data.  I have played with
> postStratify, calibrate, rake, and simply multiplying the base weights by
> the correct proportions - nothing seems to hit the published numbers on the
> nose.
<snip>
> # but matching the figure exactly requires an exact age adjustment.
>
> # create the population types vector
> pop.types <-
>     data.frame(
>         agecat = 0:3 ,
>         Freq = c( 55901 , 77670 , 72816 , 45364 )
>     )
>
>
> z.postStratified <- postStratify( z , ~agecat , pop.types , partial = T )

The standardization in the CDC examples is within each subpopulation.
That is, they standardise each race/ethnicity group to the Census age
structure, rather than standardising the whole population.  That's the
whole point -- they want to look at an imaginary population where age
and race aren't confounded.

When I do this, it almost exactly matches.  The next step was to drop
all the missing data and reweight just the non-missing data.  That
works exactly. (I also think you have the wrong recoding of RIDRETH1).

demog<-read.xport("~/Downloads/demo_f.xpt")
chol<-read.xport("~/Downloads/TCHOL_f.xpt")
alldata<-merge(demog,chol)
alldata<-subset(alldata, RIDSTATR %in% 2)
alldata<-transform(alldata,  HI_CHOL = ifelse(LBXTC>=240,1,0))
alldata<-transform(alldata, race=c(1,1,2,3,4)[RIDRETH1])
alldata<-transform(alldata, agecat=cut(RIDAGEYR,c(0,19,39,59, Inf)))

popage<-c(55901,77670,72816,45364)

racegender<-as.data.frame(svytable(~race+RIAGENDR,design))
racegenderage<-expand.grid(race=1:4,RIAGENDR=1:2,agecat=levels(alldata$agecat))
racegenderage$Freq<- as.vector(outer(racegender$Freq, popage/sum(popage)))


design <- svydesign(id=~SDMVPSU,
strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=alldata)

svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE)



somedata<-subset(alldata, !is.na(LBXTC))
design1 <- svydesign(id=~SDMVPSU,
strata=~SDMVSTRA,nest=TRUE,weights=~WTMEC2YR,data=somedata)


svyby(~HI_CHOL,~race+RIAGENDR,design=subset(postStratify(design1,~race+RIAGENDR+agecat,racegenderage),RIDAGEYR>=20),svymean,na.rm=TRUE)


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list