[Rd] Numerical stability in chisq.test
Kurt.Hornik at wu.ac.at
Thu Dec 28 13:08:11 CET 2017
>>>>> Jan Motl writes:
> The chisq.test on line 57 contains following code:
> STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
The preceding 2 lines seem relevant:
## Sorting before summing may look strange, but seems to be
## a sensible way to deal with rounding issues (PR#3486):
STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))
> However, based on book "Accuracy and stability of numerical algorithms" available from:
> Table 4.1 on page 89, it is better to sort the data in increasing order than in decreasing order, when the data are non-negative.
> An example:
> x = matrix(c(rep(1.1, 10000)), 10^16, nrow = 10001, ncol = 1) # We have a vector with 10000*1.1 and 1*10^16
> c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
> The result:
> 10000000000010996 10000000000011000
> When we sort the data in the increasing order, we get the correct result. If we sort the data in the decreasing order, we get a result that is off by 4.
> Shouldn't the sort be in the increasing order rather than in the decreasing order?
> Best regards,
> Jan Motl
> PS: This post is based on discussion on https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation and the response from the post to r-help at r-project.org.
> R-devel at r-project.org mailing list
More information about the R-devel