[R] Bug in Kendall for n<4?

Sun Nov 23 04:24:11 CET 2008

The package Kendall computes the p-value when there are ties in one
ranking. This often happens with trend testing with environmental data. I
get about 5-10 emails per year from scientists using Kendall for that
purpose who don't know how to use R very well. I suspect this means there
are many users of this package.

Thank you though for your comments.  So I will improve the documentation
for Kendall by terminating the program with an error message when n<=3
(this case is of no interest to me) and warning message when n<12 that the
p-values may be inaccurate. My student Paul Valz in this Ph.D. thesis
discussed an enumeration algorithm for the exact p-value computation for
any n with arbitrary ties in both variables -- but the algorithm is
complex and for practical purposes, I prefer to use the algorithm in
Kendall -- especially for trend testing with block bootstrap. That is the
reason for the existence of this package.

Valz's algorithm was published in JCGS but I am believe there is a
mistake, so I don't use it.  The approximate algorithm, for p-values, that
is used in Kendall, has been extensively tested.

Also, I doubt if the current p-values from cor.test are correct for small
n and I notice that ties in one ranking do produce a warning.

Finally, I will also make more clear in the documentation about cor and
cor.test being alternative functions which may be more appropriate for
some users.

Ian McLeod

> On Sat, Nov 22, 2008 at 9:04 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
>>    SM> I believe Kendall tau is well-defined for this case...
>>
>> The real question is  *WHY* there needs to be a separate package
>> 'Kendall'  when R itself does everything you want and does not show any
>> problems?
>
> Thanks for pointing me to cor(...,method="kendall"), which I did not
> know about; I used the Kendall CRAN package out of pure ignorance.
>
> In my defense, I think it is excusable ignorance, as Search on the R
> Project home page finds the Kendall package (which only mentions cor
> as a "See Also").  I only more recently discovered the advantages of
> help.search.
>
> By the way, is Kendall well-defined when the arguments are not
> permutations of each other?  cor seems to return results even in this
> case:
>
>    a<-factor(c("Alice","Bob","Chris"))
>    b<-a[1:2]
>    c<-a[2:3]
>    cor(a,b,method="kendall")
>        =>  1
>
> apparently interpreting b as c(1,2) and c as c(1,2) based on
> alphabetical order (even though it is an UNordered factor), which
> seems to make the value depend on the subjects' names, which I'd think
> was wrong for a rank-order statistic.
>
> Thanks again,
>
>            -s
>