[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Martin Maechler maechler at stat.math.ethz.ch
Wed May 31 22:00:18 CEST 2017


>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>     on Wed, 31 May 2017 18:46:34 +0200 writes:

    > Le 31/05/2017 à 17:30, Serguei Sokol a écrit :
    >> 
    >> More thorough reading revealed that I have overlooked this phrase in the
    >> line's doc: "left and right /thirds/ of the data" (emphasis is mine).
    > Oops. I have read the first ref returned by google and it happened to be
    > tibco's doc, not the R's one. The layout is very similar hence my mistake.
    > The latter does not mention "thirds" but ...
    > Anyway, here is a new line's patch which still gives a result slightly different
    > form MMline(). The slope is the same but not the intercept.
    > What are the exact terms for intercept calculation that should be implemented?

    > Serguei.

Sorry Serguei,   I have new version of line.c  since yesterday,
and will not be disturbed anymore.

Note that I *did* give the litterature, and it seems most
discussants don't have paper books in physical libraries anymore;
In this case, interestingly, you need one of those I think -
almost everything I found online did not have the exact details.

Peter Dalgaard definitely was right that Tukey did not use
quantiles at all, and notably did *not* define the three groups
via   {i;  x_i <= x_L}  and {i; x_i >= X_R}  which (as I think
you noticed) may make the groups quite unbalanced in case of duplicated x's.

But then, for now I had decided to fix the bug (namely computing
the x-medians wrongly as you diagnosed correctly(!) -- but your
first 2 patches only fixed partly) *and* go at least one step in
the direction of Tukey's original, namely by allowing iteration via a new 'iter' argument.

I have also updated the help page to document what  line()  has
been computing all these years {apart from the bug which
typically shows for non-equidistant x[]}.

We could also consider to eventually add a new   'method = <string>'
argument to line()  one version of which would continue to
compute the current solution, another would compute the one
corresponding to Velleman & Hoaglin (1981)'s  FORTRAN
implementation (which had to be corrected for some infinite-loop
cases!)... not in the close future though


Given all this discussions here, I think I should commit what I
currently have  ASAP.

Martin

    > x[DELETED ATTACHMENT line.c.patch2, plain text]



More information about the R-devel mailing list