[R] Quantiles on multiply imputed survey data - mitools

Anthony Damico ajdamico at gmail.com
Wed May 11 18:17:08 CEST 2016


hi, you want   se=T

M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
quantiles = c(.5),se=T))
MIcombine(M_quantile)



Multiple imputation results:
      with(des_mult, svyquantile(make.formula(get("var_name")), quantiles =
c(0.5),
    se = T))
      MIcombine.default(M_quantile)
       results       se
LBXTCD 12.7978 6.917285






On Wed, May 11, 2016 at 12:09 PM, Anne Bichteler <
abichteler at toxstrategies.com> wrote:

> Thanks for looking. No, for the quantiles it fails to instantiate the
> collection of designs correctly, whether hard-coding the variable name or
> using make.formula. 'with' passes make.formula correctly when calculating
> the mean, e.g. this works:
>
> MIcombine( with(des, svymean(make.formula(get('var_name')))))
>
> # Here's a reproducible example.
>
> DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>                   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>                   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>                   LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9,
> 15.3, 13.8, 84.5))
>
> DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>                   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>                   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>                   LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4,
> 10.5, 84.5))
>
> var_name <- "LBXTCD"
>
> # Individually svyquantile (and svymean) work:
> des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df1_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm =
> FALSE)
>
> des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df2_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm =
> FALSE)
>
> Imputed_list <- c()
> Imputed_list[[1]] <- DF1
> Imputed_list[[2]] <- DF2
>
> # svymean works (so the svydesign object is fine?) but svyquantile doesn't:
> des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(Imputed_list), nest=TRUE)
> M_mean <- with(des_mult, svymean(make.formula(get('var_name'))))
> summary(M_mean)
> M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
> quantiles = c(.5)))
> summary(M_quantile)
>
>
> Thanks again,
>
> Brennan
>
> www.toxstrategies.com
>
>
> From:  Anthony Damico <ajdamico at gmail.com>
> Date:  Tuesday, May 10, 2016 at 10:37 PM
> To:  Anne Bichteler <abichteler at toxstrategies.com>
> Cc:  "r-help at r-project.org" <r-help at r-project.org>
> Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools
>
>
> is the `with` not passing make.formula( get( 'var_name' ) ) through to
> svyquantile for some reason?  does this work?
>
> MIcombine( with(des, svyquantile(~LBXTCD, .5)))
>
>
>
> if that's not it, could you make a minimal reproducible example that
> includes the data download?  code to download and import nhanes here
>
>
> https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey
>
>
>
>
>
> On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler
> <abichteler at toxstrategies.com> wrote:
>
> Hello, and thank you for considering this question:
>
> The svystat object created with multiply imputed NHANES data files is
> failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the
> issue correctly, whether the behavior is expected, and whether y'all might
> have any ideas for workarounds.
>
> I'm following T. Lumley's general method outlined here:
> http://faculty.washington.edu/tlumley/old-survey/svymi.html <
> http://faculty.washington.edu/tlumley/old-survey/svymi.html>, but with
> data files I've imputed myself on the 2001/2002 biennial. Each file has
> 1081 observations and no missing values.
>
> ### Create the survey design object with list of imputed data files
> ImputedList0102.
> des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(ImputedList0102), nest=TRUE)
>
>
> ### Blood analyte of interest
> var_name <- "LBXTCD" # analyte in blood serum
>
> ### All is well calculating the mean:
> M <- with(des, svymean(make.formula(get('var_name'))))
> summary(M)
> Result <- MIcombine(M)
> Result$coefficients
> # LBXTCD
> # 17.41635
>
>
> ### but svystat object fails to calculate a 50th percentile:
> ### it fails when hard-coding the name rather than using make.formula;
> ### it fails regardless of number of files or choices in handling ties or
> interval type.
> ### There are 16 ties in each data file.
> M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles =
> c(.5)))
> summary(M1)
>
> #     Length Class  Mode
> #[1,] 1      -none- numeric
> #[2,] 1      -none- numeric
> #[3,] 1      -none- numeric
>
>
> ### The quantile is successfully calculated on one file at a time,
> however, and is different for each file.
> ### (had thought perhaps there was a lack-of-variance issue). The quantile
> calculated on each file
> ### is the same regardless of interval.type.
> des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=ImputedList0102[[1]], nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single1, c(.5))
> # 0.5
> # LBXTCD 13.5554
>
>
> des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=ImputedList0102[[2]], nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single2, c(.5))
> # 0.5
> # LBXTCD 14.06154
>
> # The number of observations exceeding the 50th percentile differs for
> each file, which I can't claim to understand.
>
> # I removed the 16 ties, but no help. Do the ties and/or different number
> of observations above/below prevent the svydesigns from being combined?
> nrow(subset(ImputedList0102[[1]], LBXTCD > 13.5554))
> # [1] 516
> nrow(subset(ImputedList0102[[2]], LBXTCD > 14.06154))
> # [1] 512
>
>
> I'm hoping someone can point me to some gross error I'm making or another
> function parameter or data manipulation or another survey-savvy method
> altogether to calculate a 50th percentile across multiply imputed data
> files. Thanks for any advice,
>
> Brennan
>
> www.toxstrategies.com <http://www.toxstrategies.com>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html <
> http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list