[R] Huber location estimate

Murray Jorgensen maj at stats.waikato.ac.nz
Thu Dec 22 11:42:30 CET 2005


D'oh!  Apologies for wasting everybody's time!

Murray

Martin Maechler wrote:
>>>>>>"Murray" == Murray Jorgensen <maj at stats.waikato.ac.nz>
>>>>>>    on Thu, 22 Dec 2005 22:13:45 +1300 writes:
> 
> 
>     Murray> Prof Brian Ripley wrote:
>     >> On Thu, 22 Dec 2005, Murray Jorgensen wrote:
>     >> 
>     >>> We have a choice when calculating the Huber location estimate:
>     >>> > set.seed(221205)
>     >>> > y <- 7 + 3*rt(30,1)
>     >> 
>     >> 
>     >> That's Cauchy, BTW, a very extreme case.
> 
>     Murray> Sure, the sort of situation where one might want a robust estimator.
> 
>     Murray> [...]
> 
>     >> Note that the huber() scale estimate is the initial MAD, whereas rlm 
>     >> iterates.  Because during iteration the 'center' for MAD is known to be 
>     >> zero, the results differ.  For symmetric distributions there is little 
>     >> difference, but your sample is not close to symmetric.
> 
>     Murray> [Blush] yes I knew that and somehow I forgot. But leave rlm() alone for 
>     Murray> a while and do IRLS with fixed scale:
> 
>     Murray> th <- median(y)
>     Murray> s <- mad(y)
>     Murray> # paste this in a few times:
>     Murray> w <- ifelse((y-th< 1.345*s & y-th>-1.345*s), 1, 1.345*s/abs(y-th))
>     Murray> th <- weighted.mean(y,w)
>     Murray> th
> 
>     Murray> We converge to
>     >> th
>     Murray> [1] 5.9203
>     Murray> close to the answer given by rlm() different from
>     >> huber(y)$mu
>     Murray> [1] 5.9117
> 
>     Murray> So the variable scale does not account for the difference.
> 
> No, the main difference is the different default:  
>     huber() has  k=1.5
> and rlm()   has  k=1.345 :
> 
> Try this
> 
> set.seed(221205)
> y <- 7 + 3*rt(30,1)
> 
> str(huber(y, k = 1.345), digits = 5)
> ## List of 2
> ##  $ mu: num 5.9203
> ##  $ s : num 4.0967
> 
> str(rlm(y ~ 1)[c("coefficients", "s")], digits = 5) #
> ## (edited to)
> ##  $ coefficients: num 5.9204
> ##  $ s           : num 3.7463
> 
> which gives 'mu' very close, even for the iterated
> vs. non-iterated scales.
> 
> Martin Maechler, ETH Zurich

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 1395 862




More information about the R-help mailing list