[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Wed May 31 16:40:30 CEST 2017

And with "equally spaced" I obviously meant "of equal size". It's getting
too hot in the office here...

On Wed, May 31, 2017 at 4:39 PM, Joris Meys <jorismeys at gmail.com> wrote:

> Seriously, if a method gives a wrong result, it's wrong. line() does NOT
> implement the algorithm of Tukey, even not after the patch. We're not
> discussing Excel here, are we?
>
> The method of Tukey is rather clear, and it is NOT using the default
> quantile definition from the quantile function. Actually, it doesn't even
> use quantiles to define the groups. It just says that the groups should be
> more or less equally spaced. As the method of Tukey relies on the medians
> of the subgroups, it would make sense to pick a method that is
> approximately unbiased with regard to the median. That would be type 8
> imho.
>
> To get the size of the outer groups, Tukey would've been more than happy
> enough with a:
>
> > floor(length(dfr$time) / 3)
> [1] 6
>
> There you have the size of your left and right group, and now we can
> discuss about which median type should be used for the robust fitting.
>
> But I can honestly not understand why anyone in his right mind would
> defend a method that is clearly wrong while not working at Microsoft's
> spreadsheet department.
>
> Cheers
> Joris
>
> On Wed, May 31, 2017 at 4:03 PM, Serguei Sokol <sokol at insa-toulouse.fr>
> wrote:
>
>> Le 31/05/2017 à 15:40, Joris Meys a écrit :
>>
>>> OTOH,
>>>
>>> > sapply(1:9, function(i){
>>> +   sum(dfr$time <= quantile(dfr$time, 1./3., type = i))
>>> + })
>>> [1] 8 8 6 6 6 6 8 6 6
>>>
>>> Only the default (type = 7) and the first two types give the result
>>> lines() gives now. I think there is plenty of reasons to give why any of
>>> the other 6 types might be better suited in Tukey's method.
>>>
>>> So to my mind, chaning the definition of line() to give sensible output
>>> that is in accordance with the theory, does not imply any inconsistency
>>> with the quantile definition in R. At least not with 6 out of the 9
>>> different ones ;-)
>>>
>> Nice shot.
>> But OTOE (on the other end ;)
>> > sapply(1:9, function(i){
>> +   sum(dfr$time >= quantile(dfr$time, 2./3., type = i))
>> + })
>> [1] 8 8 8 8 6 6 8 6 6
>>
>> Here "8" gains 5 votes against 4 for "6". There were two defector methods
>> that changed the point number and should be discarded. Which leaves us
>> with the score 3:4, still in favor of "6" but the default method should
>> prevail
>> in my sens.
>>
>> Serguei.
>>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79 <+32%209%20264%2061%2079>
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]