[R] Bug in Kendall for n<4?
Martin Maechler
maechler at stat.math.ethz.ch
Mon Nov 24 11:24:18 CET 2008
Dear Ian,
thanks a lot for your clarifications.
>>>>> "AIM" == A I McLeod <aim at stats.uwo.ca>
>>>>> on Sat, 22 Nov 2008 22:24:11 -0500 (EST) writes:
AIM> The package Kendall computes the p-value when there are
AIM> ties in one ranking. This often happens with trend
AIM> testing with environmental data. I get about 5-10
AIM> emails per year from scientists using Kendall for that
AIM> purpose who don't know how to use R very well. I
AIM> suspect this means there are many users of this
AIM> package.
Indeed, the case of ties in the data is an important one in
possibly many applications, and indeed, cor.test() is
and hence the Kendall package is
serving an important need!
I do apologize for my impolite wording to which I was lead by
the example (and 'Subject').
If the topic is just *computation* of Kendall's tau, I don't
think anyone should use the Kendall package.
If, however, one is interested in P-values of (H0: tau = 0),
your Kendall package is indeed a valuable asset!
AIM> Thank you though for your comments. So I will improve
AIM> the documentation for Kendall by terminating the
AIM> program with an error message when n<=3 (this case is
AIM> of no interest to me) and warning message when n<12
AIM> that the p-values may be inaccurate. My student Paul
AIM> Valz in this Ph.D. thesis discussed an enumeration
AIM> algorithm for the exact p-value computation for any n
AIM> with arbitrary ties in both variables -- but the
AIM> algorithm is complex and for practical purposes, I
AIM> prefer to use the algorithm in Kendall -- especially
AIM> for trend testing with block bootstrap. That is the
AIM> reason for the existence of this package.
AIM> Valz's algorithm was published in JCGS but I am believe
AIM> there is a mistake, so I don't use it. The approximate
AIM> algorithm, for p-values, that is used in Kendall, has
AIM> been extensively tested.
AIM> Also, I doubt if the current p-values from cor.test are
AIM> correct for small n and I notice that ties in one
AIM> ranking do produce a warning.
That's an interesting point about which I think we should
exchange more, but really in a different thread, possibly on
R-devel rather than R-help.
Thanking you and apologizing once more:
Martin Maechler, ETH Zurich
AIM> Finally, I will also make more clear in the
AIM> documentation about cor and cor.test being alternative
AIM> functions which may be more appropriate for some users.
AIM> Ian McLeod
>> On Sat, Nov 22, 2008 at 9:04 AM, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
SM> I believe Kendall tau is well-defined for this case...
>>>
>>> The real question is *WHY* there needs to be a separate
>>> package 'Kendall' when R itself does everything you want
>>> and does not show any problems?
>>
>> Thanks for pointing me to cor(...,method="kendall"),
>> which I did not know about; I used the Kendall CRAN
>> package out of pure ignorance.
>>
>> In my defense, I think it is excusable ignorance, as
>> Search on the R Project home page finds the Kendall
>> package (which only mentions cor as a "See Also"). I
>> only more recently discovered the advantages of
>> help.search.
>>
>> By the way, is Kendall well-defined when the arguments
>> are not permutations of each other? cor seems to return
>> results even in this case:
>>
>> a<-factor(c("Alice","Bob","Chris")) b<-a[1:2] c<-a[2:3]
>> cor(a,b,method="kendall") => 1
>>
>> apparently interpreting b as c(1,2) and c as c(1,2) based
>> on alphabetical order (even though it is an UNordered
>> factor), which seems to make the value depend on the
>> subjects' names, which I'd think was wrong for a
>> rank-order statistic.
>>
>> Thanks again,
>>
>> -s
>>
More information about the R-help
mailing list