[R] Quantiles on multiply imputed survey data - mitools

Anne Bichteler abichteler at toxstrategies.com
Tue May 10 22:33:17 CEST 2016


Hello, and thank you for considering this question:

The svystat object created with multiply imputed NHANES data files is failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the issue correctly, whether the behavior is expected, and whether y'all might have any ideas for workarounds.

I'm following T. Lumley's general method outlined here: http://faculty.washington.edu/tlumley/old-survey/svymi.html, but with data files I've imputed myself on the 2001/2002 biennial. Each file has 1081 observations and no missing values.

### Create the survey design object with list of imputed data files ImputedList0102.
des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=imputationList(ImputedList0102), nest=TRUE)


### Blood analyte of interest
var_name <- "LBXTCD" # analyte in blood serum

### All is well calculating the mean:
M <- with(des, svymean(make.formula(get('var_name'))))
summary(M)
Result <- MIcombine(M)
Result$coefficients
# LBXTCD 
# 17.41635


### but svystat object fails to calculate a 50th percentile:
### it fails when hard-coding the name rather than using make.formula;
### it fails regardless of number of files or choices in handling ties or interval type.
### There are 16 ties in each data file.
M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = c(.5)))
summary(M1)

#     Length Class  Mode   
#[1,] 1      -none- numeric
#[2,] 1      -none- numeric
#[3,] 1      -none- numeric


### The quantile is successfully calculated on one file at a time, however, and is different for each file.
### (had thought perhaps there was a lack-of-variance issue). The quantile calculated on each file
### is the same regardless of interval.type.
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=ImputedList0102[[1]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5))
# 0.5
# LBXTCD 13.5554


des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, data=ImputedList0102[[2]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5))
# 0.5
# LBXTCD 14.06154

# The number of observations exceeding the 50th percentile differs for each file, which I can't claim to understand.

# I removed the 16 ties, but no help. Do the ties and/or different number of observations above/below prevent the svydesigns from being combined?
nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554))
# [1] 516
nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154))
# [1] 512


I'm hoping someone can point me to some gross error I'm making or another function parameter or data manipulation or another survey-savvy method altogether to calculate a 50th percentile across multiply imputed data files. Thanks for any advice,

Brennan

www.toxstrategies.com


More information about the R-help mailing list