[R] Multiple Imputation in mice/norm

Bert Gunter gunter.berton at gene.com
Mon Apr 27 19:02:30 CEST 2009


Folks:

A comment ... subject to vigorous refutation, since it's jmui (just my
uninformed opinon).

It strikes me that this is a case where one may need own up to the
limitations of the data and be transparent about the tentativeness of the
statistical approaches. I say this because the statistical literature and
popular perception often seem to be that statistical methodology can
overcome these limitations and produce definitive answers in spite of them.
And, of course, statistical researchers tend to be enamored with their
clever methodology and gloss over the inevitable fact that their proofs
begin with "assume that ... " (reminding me of the old saw that "assume" can
make an ass out of u and me). 

Perhaps a useful approach is sensitivity analysis: try several quite
different approaches, each consistent with one reasonable set of
assumptions, and see how they compare. Not a new idea, of course, but
perhaps one worth being reminded of in such situations.

As always, thanks for **your** knowledgeable summary of exactly these
matters, Frank.

-- Bert Gunter
Genetech

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Frank E Harrell Jr
Sent: Saturday, April 25, 2009 3:38 PM
To: David Winsemius
Cc: Emmanuel Charpentier; r-help at stat.math.ethz.ch
Subject: Re: [R] Multiple Imputation in mice/norm

David Winsemius wrote:
> 
> On Apr 25, 2009, at 9:25 AM, Frank E Harrell Jr wrote:
> 
>> Emmanuel Charpentier wrote:
>>> Le vendredi 24 avril 2009 à 14:11 -0700, ToddPW a écrit :
>>>> I'm trying to use either mice or norm to perform multiple imputation 
>>>> to fill
>>>> in some missing values in my data.  The data has some missing values 
>>>> because
>>>> of a chemical detection limit (so they are left censored).  I'd like 
>>>> to use
>>>> MI because I have several variables that are highly correlated.  In 
>>>> SAS's
>>>> proc MI, there is an option with which you can limit the imputed 
>>>> values that
>>>> are returned to some range of specified values.  Is there a way to 
>>>> limit the
>>>> values in mice?
>>> You may do that by writing your own imputation function and assign them
>>> for the imputation of particular variable (see argument
>>> "imputationMethod" and details in the man page for "mice").
>>>>                 If not, is there another MI tool in R that will 
>>>> allow me to
>>>> specify a range of acceptable values for my imputed data?
>>> In the function amelia (package "Amelia"), you might specify a "bounds"
>>> argument, which allows for such a limitation. However, be aware that
>>> this might destroy the basic assumption of Amelia, which is that your
>>> data are multivariate normal. Maybe a change of variable is in order (e.
>>> g. log(concentration) has usually much better statistical properties
>>> than concentration).
>>> Frank Harrell's aregImpute (package Hmisc) has the "curtail" argument
>>> (TRUE by default) which limits imputations to the range of observed
>>> values.
>>> But if your left-censored variables are your dependent variables (not
>>> covariates), may I suggest to analyze these data as censored data, as
>>> allowed by Terry Therneau's "coxph" function (package "survival") ? code
>>> your "missing" data as such a variable (use :
>>> coxph(Surv(min(x,<yourlimit>,na.rm=TRUE),
>>>           !is.na(x),type="left")~<Yourmodel>) to do this on-the-fly).
>>> Another possible idea is to split your (supposedly x) variable in two :
>>> observed (logical), and value (observed value if observed, <detection
>>> limit> if not) and include these two data in your model. You probably
>>> will run into numerical difficulties due to the (built-in total
>>> separation...).
>>> HTH,
>>>                     Emmanuel Charpentier
>>>> Thanks for the help,
>>>> Todd
>>>>
>>
>> All see
>>
>> @Article{zha09non,
>>  author =               {Zhang, Donghui and Fan, Chunpeng and Zhang, 
>> Juan and Zhang, {Cun-Hui}},
>>  title =                {Nonparametric methods for measurements below 
>> detection limit},
>>  journal =      Stat in Med,
>>  year =                 2009,
>>  volume =       28,
>>  pages =        {700-715},
>>  annote =       {lower limit of detection;left censoring;Tobit 
>> model;Gehan test;Peto-Peto test;log-rank test;Wilcoxon test;location 
>> shift model;superiority of nonparametric methods}
>> }
>>
>>
>> -- 
>> Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                     Department of Biostatistics   Vanderbilt University
>>
> 
> It appears they were dealing with outcomes possibly censored at a limit 
> of detection. At least that was the example they used to illustrate.
> 
> Is there a message that can be inferred about what to do with covariates 
> with values below the limit of detection? And can someone translate to a 
> non-statistician what the operational process was on the values below 
> the limit of detection in the Wilcoxon approach that they endorsed? They 
> transformed the right censored situation into a left censored one and 
> then they do   ... what?
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> 

Yes it's easier to handle in the dependent variable.  For independent 
variables below the limit of detection we are left with model-based 
extrapolation for multiple imputation, with no way to check the 
imputation model's regression assumption.  Predictive mean matching 
can't be used.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list