[Rd] Fast Kendall's Tau

Adler, Avraham Avraham.Adler at guycarp.com
Wed Jun 27 17:10:40 CEST 2012

> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: Wednesday, June 27, 2012 1:24 AM
> To: Duncan Murdoch
> Cc: Adler, Avraham; r-devel at r-project.org
> Subject: Re: [Rd] Fast Kendall's Tau
> On 26/06/2012 22:44, Duncan Murdoch wrote:
>> On 12-06-25 2:48 PM, Adler, Avraham wrote:
>>> Hello.
>>> Has any further action been taken regarding implementing David
>>> Simcha's fast Kendall tau code (now found in the package pcaPP as
>>> cor.fk) into R-base? It is literally hundreds of times faster,
>>> although I am uncertain as to whether he wrote code for testing the
>>> significance of the parameter. The last mention I have seen of this
>>> was in
>>> 2010<https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html>.
>> You could check the NEWS file, but I don't remember anything being
>> done along these lines.  If the code is in a CRAN package, there
>> doesn't seem to be any need to move it to base R.
> In addition, this is something very specialized, and the code in R is fast
> enough for all but the most unusual instances of that specialized task.
> example(cor.fk) shows the R implementation takes well under a second for 2000
> cases (a far higher value than is usual).

Thank you all very much for the replies. I was approaching the problem from the vantage point of trying to fit Archimedean copulas to events which come from non-elliptical distributions, and had a few hundred thousand data points. Not as bad as the authors of this paper, <http://vigna.dsi.unimi.it/ftp/papers/ParadoxicalPageRank.pdf> who needed to calculate Kendall's tau based on hundreds of millions of pairs(!). I wrote an implementation in VBA, and when I went to R to confirm my calculations, I was surprised to see that even my VBA code was probably hundreds of times as fast as R (on a vector of exactly 100,000 pairs). The implementation in pcaPP runs in a second or less on the same vector.

Perhaps, as was suggested in another e-mail, the least intrusive (and best bang-for-buck) option is to have the documentation/help of "cor" updated to refer to cor.fk so that more people can be made aware of the availability for those of us who have to deal with ungainly data sets.

Thank you again,

Avraham Adler

More information about the R-devel mailing list