[R] conservative robust estimation in (nonlinear) mixed models

Fri Mar 24 17:38:11 CET 2006

I believe that Bert's comments are a non sequitur.
I did not and do not propose identifying which components
of the model are contaminated by outliers. What I do propose
is the more or less routine use of conservative robust methods
to replace the normal theory estimators. By definition such estimators
are to be almost as efficient as the normal theory estimators in the 
case where the normal theory applies. One may argue that
conservative robust estimators do not exist for this class of
problems. I think they do, but the obvious way to establish this
claim is to carry out simulations.

Before such simulations can be carried out one must create the
software to do the analysis. So I am proposing to add that to our
R package glmmADMB. Then other R users can carry out their own
simulation analysis to investigate how the method performs.
I think that normal mixtures are better candidates for
conservative robust estimators than say Student's T distribution,
but I will try to include both (and perhaps any others that appear
useful).

      Dave

> 	  Bert raised an issue I had overlooked.  Ideally, we would like to be 
> able to specify a different "family" for the observations and for each 
> random effect, with Student's t and contaminated normal as valid options 
> in both places.
> 
> 	  If I were allowed to specify a family (or a robust family) for either 
> observations or for random effects but not both, I think I'd pick the 
> observations.  I don't know, but I wonder if misspecification of the 
> observation distribution might create more problems with estimation and 
> inference than misspecification of the distribution of a random effect. 
>   As Bert indicated, there may be identifiability issues here, and the 
> choice of a model may depend on one's hypotheses about the situation 
> being modeled.
> 
> 	  spencer graves
> 
> Berton Gunter wrote:
> 
>> Ok, since Spencer has dived in,I'll go public (I made some prior private
>> remarks to David because I didn't think they were worth wasting the list's
>> bandwidth on. Heck, they may still not be...)
>> 
>> My question: isn't the difficult issue which levels of the (co)variance
>> hierarchy get longer tailed distributions rather than which distributions
>> are used to model ong tails? Seems to me that there is an inherent
>> identifiability issue here, and even more so with nonlinear models. It's
>> easy to construct examples where it all essentially depends on your priors.
>> 
>> Cheers,
>> Bert
>> 
>> -- Bert Gunter
>> Genentech Non-Clinical Statistics
>> South San Francisco, CA
>>   
>>  
>> 
>> 
>>>-----Original Message-----
>>>From: r-help-bounces at stat.math.ethz.ch 
>>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
>>>Sent: Thursday, March 23, 2006 12:34 PM
>>>To: otter at otter-rsch.com
>>>Cc: r-help at stat.math.ethz.ch
>>>Subject: Re: [R] conservative robust estimation in 
>>>(nonlinear) mixed models
>>>
>>>	  I know of two fairly common models for robust 
>>>methods.  One is the 
>>>contaminated normal that you mentioned.  The other is Student's t.  A 
>>>normal plot of the data or of residuals will often indicate 
>>>whether the 
>>>assumption of normality is plausible or not;  when the plot indicates 
>>>problems, it will often also indicate whether a contaminated 
>>>normal or 
>>>Student's t would be better.
>>>
>>>	  Using Student's t introduces one additional parameter.  A 
>>>contaminated normal would introduce 2;  however, in many 
>>>applications, 
>>>the contamination proportion (or its logit) will often b highly 
>>>correlated with the ratio of the contamination standard deviation to 
>>>that of the central portion of the distribution.  Thus, in 
>>>some cases, 
>>>it's often wise to fix the ratio of the standard deviations 
>>>and estimate 
>>>only the contamination proportion.
>>>
>>>	  hope this helps.
>>>	  spencer graves
>>>
>>>dave fournier wrote:
>>>
>>>
>>>>Conservative robust estimation methods do not appear to be
>>>>currently available in the standard mixed model methods for R,
>>>>where by conservative robust estimation I mean methods which
>>>>work almost as well as the methods based on assumptions of
>>>>normality when the assumption of normality *IS* satisfied.
>>>>
>>>>We are considering adding such a conservative robust 
>>>
>>>estimation option
>>>
>>>>for the random effects to our AD Model Builder mixed model package,
>>>>glmmADMB, for R, and perhaps extending it to do robust 
>>>
>>>estimation for 
>>>
>>>>linear mixed models at the same time.
>>>>
>>>>An obvious candidate is to assume something like a mixture of
>>>>normals. I have tested this in a simple linear mixed model
>>>>using 5% contamination with  a normal with 3 times the standard 
>>>>deviation, which seems to be
>>>>a common assumption. Simulation results indicate that when the
>>>>random effects are normally distributed this estimator is about
>>>>3% less efficient, while when the random effects are 
>>>
>>>contaminated with
>>>
>>>>5% outliers  the estimator is about 23% more efficient, where by 23%
>>>>more efficient I mean that one would have to use a sample size about
>>>>23% larger to obtain the same size confidence limits for the
>>>>parameters.
>>>>
>>>>Question?
>>>>
>>>>I wonder if there are other distributions besides a mixture 
>>>
>>>or normals. 
>>>
>>>>which might be preferable. Three things to keep in mind are:
>>>>
>>>>    1.)  It should be likelihood based so that the standard 
>>>
>>>likelihood
>>>
>>>>          based tests are applicable.
>>>>
>>>>    2.)  It should work well when the random effects are normally
>>>>         distributed so that things that are already fixed don't get
>>>>         broke.
>>>>
>>>>    3.)  In order to implement the method efficiently it is 
>>>
>>>necessary to
>>>
>>>>         be able to produce code for calculating the inverse of the
>>>>         cumulative distribution function. This enables one 
>>>
>>>to extend
>>>
>>>>         methods based one the Laplace approximation for the random
>>>>         effects (i.e. the Laplace approximation itself, adaptive
>>>>         Gaussian integration, adaptive importance 
>>>
>>>sampling) to the new
>>>
>>>>         distribution.
>>>>
>>>>      Dave
>>>>
>>>

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com