[R] Bug in Kendall for n<4?

Mon Nov 24 11:24:18 CET 2008

Dear Ian,

thanks a lot for your clarifications.

>>>>> "AIM" == A I McLeod <aim at stats.uwo.ca>
>>>>>     on Sat, 22 Nov 2008 22:24:11 -0500 (EST) writes:

    AIM> The package Kendall computes the p-value when there are
    AIM> ties in one ranking. This often happens with trend
    AIM> testing with environmental data. I get about 5-10
    AIM> emails per year from scientists using Kendall for that
    AIM> purpose who don't know how to use R very well. I
    AIM> suspect this means there are many users of this
    AIM> package.

Indeed, the case of ties in the data is an important one in
possibly many applications, and indeed, cor.test() is
and hence the Kendall package is
serving an important need!

I do apologize for my impolite wording to which I was lead by
the example (and 'Subject').
If the topic is just *computation* of Kendall's tau, I don't
think anyone should use the Kendall package.
If, however, one is interested in P-values of (H0:  tau = 0),
your Kendall package is indeed a valuable asset!

    AIM> Thank you though for your comments.  So I will improve
    AIM> the documentation for Kendall by terminating the
    AIM> program with an error message when n<=3 (this case is
    AIM> of no interest to me) and warning message when n<12
    AIM> that the p-values may be inaccurate. My student Paul
    AIM> Valz in this Ph.D. thesis discussed an enumeration
    AIM> algorithm for the exact p-value computation for any n
    AIM> with arbitrary ties in both variables -- but the
    AIM> algorithm is complex and for practical purposes, I
    AIM> prefer to use the algorithm in Kendall -- especially
    AIM> for trend testing with block bootstrap. That is the
    AIM> reason for the existence of this package.

    AIM> Valz's algorithm was published in JCGS but I am believe
    AIM> there is a mistake, so I don't use it.  The approximate
    AIM> algorithm, for p-values, that is used in Kendall, has
    AIM> been extensively tested.

    AIM> Also, I doubt if the current p-values from cor.test are
    AIM> correct for small n and I notice that ties in one
    AIM> ranking do produce a warning.

That's an interesting point about which I think we should
exchange more, but really in a different thread, possibly on
R-devel rather than R-help.

Thanking you and apologizing once more:
Martin Maechler, ETH Zurich

    AIM> Finally, I will also make more clear in the
    AIM> documentation about cor and cor.test being alternative
    AIM> functions which may be more appropriate for some users.

    AIM> Ian McLeod

    >> On Sat, Nov 22, 2008 at 9:04 AM, Martin Maechler
    >> <maechler at stat.math.ethz.ch> wrote:
    SM> I believe Kendall tau is well-defined for this case...
    >>> 
    >>> The real question is *WHY* there needs to be a separate
    >>> package 'Kendall' when R itself does everything you want
    >>> and does not show any problems?
    >> 
    >> Thanks for pointing me to cor(...,method="kendall"),
    >> which I did not know about; I used the Kendall CRAN
    >> package out of pure ignorance.
    >> 
    >> In my defense, I think it is excusable ignorance, as
    >> Search on the R Project home page finds the Kendall
    >> package (which only mentions cor as a "See Also").  I
    >> only more recently discovered the advantages of
    >> help.search.
    >> 
    >> By the way, is Kendall well-defined when the arguments
    >> are not permutations of each other?  cor seems to return
    >> results even in this case:
    >> 
    >> a<-factor(c("Alice","Bob","Chris")) b<-a[1:2] c<-a[2:3]
    >> cor(a,b,method="kendall") => 1
    >> 
    >> apparently interpreting b as c(1,2) and c as c(1,2) based
    >> on alphabetical order (even though it is an UNordered
    >> factor), which seems to make the value depend on the
    >> subjects' names, which I'd think was wrong for a
    >> rank-order statistic.
    >> 
    >> Thanks again,
    >> 
    >> -s
    >>