[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3
Serguei Sokol
sokol at insa-toulouse.fr
Wed May 31 15:06:43 CEST 2017
Le 30/05/2017 à 18:51, Martin Maechler a écrit :
>>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>> on Tue, 30 May 2017 16:01:17 +0200 writes:
> > Le 30/05/2017 à 09:33, Martin Maechler a écrit : ...
> >> However, even after the patch, The example from the SO
> >> post differs from the result of Richie Cotton's
> >> function...
> > The explanation is quite simple. In SO function, the first
> > 1/3 quantile of used example counts 6 points (of 19 in
> > total), while line()'s definition of quantile leads to 8
> > points. The same numbers (6 and 8) are on the other end of
> > sample.
>
> so the number of obs. for the three thirds for line() are
> {8, 3, 8} in line() [also, after your patch, right?]
>
> whereas in MMline() they are as they should be, namely
>
> {6, 7, 6}
>
> But the {8, 3, 8} split is not at all what all "the literature",
> including Tukey himself says that "should" be done.
> (Other literature on the topic suggests that the optimal sizes
> of the split in three groups depends on the distribution of x ..)
>
> OTOH, MMline() does exactly what "the literature" and also the
> reference on the ?line help pages says.
Well, what I have seen so far in "literature" was mention of 1/3 quantiles
(but, yes I could overlook smth as I did not spend too much time on it)
So the sample distribution in three groups boils down to a particular quantile
definition to use. It turns out that the line()'s version (you are right, _after_ the patch
but my patch left this definition untouched) is consistent with the R's one.
If you do in R sum(dfr$time <= quantile(dfr$time, 1./3.)) you get 8, not 6
(and the same on the 2/3 end).
To my mind, consistency with the rest of R, namely with the quantile definition,
is an argument good enough to let the line()'s definition as is.
Serguei.
More information about the R-devel
mailing list