# [Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Serguei Sokol sokol at insa-toulouse.fr
Wed May 31 15:06:43 CEST 2017

Le 30/05/2017 à 18:51, Martin Maechler a écrit :
>>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>>      on Tue, 30 May 2017 16:01:17 +0200 writes:
>      > Le 30/05/2017 à 09:33, Martin Maechler a écrit : ...
>      >> However, even after the patch, The example from the SO
>      >> post differs from the result of Richie Cotton's
>      >> function...
>      > The explanation is quite simple. In SO function, the first
>      > 1/3 quantile of used example counts 6 points (of 19 in
>      > total), while line()'s definition of quantile leads to 8
>      > points. The same numbers (6 and 8) are on the other end of
>      > sample.
>
> so the number of obs. for the three thirds for line() are
>     {8, 3, 8}  in line()  [also, after your patch, right?]
>
> whereas in MMline() they are as they should be, namely
>
>     {6, 7, 6}
>
> But the  {8, 3, 8}  split is not at all what all "the literature",
> including Tukey himself says that "should" be done.
> (Other literature on the topic suggests that the optimal sizes
>   of the split in three groups depends on the distribution of x ..)
>
> OTOH, MMline() does exactly what "the literature" and also  the
> reference on the  ?line  help pages says.
Well, what I have seen so far in "literature" was mention of 1/3 quantiles
(but, yes I could overlook smth as I did not spend too much time on it)
So the sample distribution in three groups boils down to a particular quantile
definition to use. It turns out that the line()'s version (you are right, _after_ the patch
but my patch left this definition untouched) is consistent with the R's one.
If you do in R sum(dfr\$time <= quantile(dfr\$time, 1./3.)) you get 8, not 6
(and the same on the 2/3 end).
To my mind, consistency with the rest of R, namely with the quantile definition,
is an argument good enough to let the line()'s definition as is.

Serguei.