[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

GlenB glnbrntt at gmail.com
Wed May 31 06:13:31 CEST 2017


Martin Maechler says in reply to Sergueï Sokol

> Note the 'Subject' you've chosen for this thread,
 "... does not produce the correct Tukey line",

The choice of title was mine not Serguei's; I posted the original message
where the error was pointed out

I agree with Martin's assessment that the correct split  (both by Tukey's
lights and by general practice)
for 19 points would be 6,7,6 and I also agree that it's better to "fix
more" in this instance, where possible.
(e.g. Johnstone&Velleman's standard errors would be a nice thing to add if
feasible) --
but if any blame is attached to the choice of title, it  really should be
aimed at me.

Glen

On Wed, May 31, 2017 at 2:51 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

> >>>>> Serguei Sokol <sokol at insa-toulouse.fr>
> >>>>>     on Tue, 30 May 2017 16:01:17 +0200 writes:
>
>     > Le 30/05/2017 à 09:33, Martin Maechler a écrit : ...
>     >> However, even after the patch, The example from the SO
>     >> post differs from the result of Richie Cotton's
>     >> function...
>     > The explanation is quite simple. In SO function, the first
>     > 1/3 quantile of used example counts 6 points (of 19 in
>     > total), while line()'s definition of quantile leads to 8
>     > points. The same numbers (6 and 8) are on the other end of
>     > sample.
>
> so the number of obs. for the three thirds for line() are
>    {8, 3, 8}  in line()  [also, after your patch, right?]
>
> whereas in MMline() they are as they should be, namely
>
>    {6, 7, 6}
>
> But the  {8, 3, 8}  split is not at all what all "the literature",
> including Tukey himself says that "should" be done.
> (Other literature on the topic suggests that the optimal sizes
>  of the split in three groups depends on the distribution of x ..)
>
> OTOH, MMline() does exactly what "the literature" and also  the
> reference on the  ?line  help pages says.
>
>     > In x sample, there are few repeated values, this
>     > is certainly be the reason of different quantiles..
>
>     > I am not sure that one quantile definition is better or
>     > more correct than the other.
>
>     > So I would leave line()'s definition as is.
>
> you mean  _after_ applying your patch, I assume.
>
> I currently tend do disagree. If we change line() we should
> rather fix more ..
> Note the 'Subject' you've chosen for this thread,
>  "... does not produce the correct Tukey line",
> so I think we should get better.
>
> Apart from Richie / my  MMline() function, I've also noticed
> that   ACSWR :: resistant_line()
> exists.
>
> However "the literature" (see references below), notably the two
> with Hoaglin, strongly  recommends smarter iterations, and
> -- lo and behold! -- when this topic came up last (for me) in
> Dec. 2014, I did spend about 2 days work (or more?) to get the
> FORTRAN code from the 1981 - book (which is abbreviated the
> "ABC of EDA") from a somewhat useful OCR scan into compilable
> Fortran code and then f2c'ed, wrote an R interface function
> found problems i.e., bugs, including infinite loops, fixed most
> AFAICS, but somehow did not finish making the result available.
>
> Yes, and I have too many other things on my desk... this will
> have to wait!
>
> References:
>
>      Tukey, J. W. (1977).  _Exploratory Data Analysis_, Reading
>      Massachusetts: Addison-Wesley.
>
>      Velleman, P. F. and Hoaglin, D. C. (1981) _Applications, Basics
>      and Computing of Exploratory Data Analysis_ Duxbury Press.
>
>      Emerson, J. D. and Hoaglin, D. C. (1983) Resistant Lines for y
>      versus x.  Chapter 5 of _Understanding Robust and Exploratory Data
>      Analysis_, eds. David C. Hoaglin, Frederick Mosteller and John W.
>      Tukey.  Wiley.
>
>      Iain M. Johnstone and Paul F. Velleman (1985) The Resistant Line
>      and Related Regression Methods.  _Journal of the American
>      Statistical Association_ *80*, 1041-1054.  <URL:
>      https://dx.doi.org/10.1080/01621459.1985.10478222>
>
>
>     > Best, Sergueï.
>
> Martin Maechler, ETH Zurich (and R core team)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list