[Rd] stats::line() does not produce correct Tukey line when n mod 6 is 2 or 3

Duncan Murdoch murdoch.duncan at gmail.com
Mon May 29 00:40:14 CEST 2017


On 27/05/2017 9:28 PM, GlenB wrote:
> Bug: stats::line() does not produce correct Tukey line when n mod 6 is 2 or
> 3
>
> Example: line(1:9,1:9) should have intercept 0 and slope 1 but it gives
> intercept -1 and slope 1.2
>
> Trying line(1:i,1:i) across a range of i makes it clear there's a cycle of
> length 6, with four of every six correct.
>
> Bug has been present across many versions.
>
> The machine I just tried it on just now has R3.2.3:

If you look at the source (in src/library/stats/src/line.c), the 
explanation is clear:  the x value is chosen as the 1/6 quantile 
(according to a particular definition of quantile), and the y value is 
chosen as the median of the y values where x is less than or equal to 
the 1/3 quantile.  Those are different definitions (though I think they 
would be asymptotically equivalent under pretty weak assumptions), so 
it's not surprising the x value doesn't correspond perfectly to the y 
value, and the line ends up "wrong".

So is it a bug?  Well, that depends on Tukey's definition.  I don't have 
a copy of his book handy so I can't really say.  Maybe the R function is 
doing exactly what Tukey said it should, and that's not a bug.  Or maybe 
R is wrong.

Duncan Murdoch



More information about the R-devel mailing list