[Rd] O(N log N) Kendall Tau

Dirk Eddelbuettel edd at debian.org
Sun Dec 13 17:27:03 CET 2009


On 13 December 2009 at 00:38, David Simcha wrote:
| I've noticed that the implementation of Kendall's Tau in R is O(N^2).  
| The following reference describes how it can be done in O(N log N):
| 
|  A Computer Method for Calculating Kendall's Tau with Ungrouped Data
|  William R. Knight
|  Journal of the American Statistical Association, Vol. 61, No. 314, Part 
| 1 (Jun., 1966), pp. 436-439
|  http://www.jstor.org/pss/2282833

Interesting.  I recently used the performance of cor(X, method="kendall") as
a contrast to the 26-fold speedup when (!!) using gpuCor(X, method="kendall")
from the nice gputools package by Josh Buckner and Mark Seligman.  That was
using the GPU in a QuadroFX 4800 with a 1206 x 477 matrix; the full example
is in my most recent 'Intro to HPC with R' slides.

| I'm interested in contributing an implementation of this algorithm to 
| R.  However, I have a few questions:
| 
| 1.  Would it likely be accepted if well-implemented?
| 2.  cov.c in the main/ directory of the R source tree seems to contain 
| at least 4 different but nearly identical implementations of Kendall's 
| Tau:  cov_complete1, cov_complete2, cov_na1, and cov_na2.  I can't tell 
| why.  

Combine your bottom-up code analysis with a top-down usage analysis -- open
up R and read 'help(cov)'.  There are different ways to deal with missing
observations.  

| The file is very difficult to read because of its lack of 
| comments, (ab)use of macros and improper indentation.  As I will 
| probably need to modify this file, can some seasoned veteran please 
| explain what the heck is going on in this file?

Again, I think reading the help file along with the code may prove
beneficial.  The R implementation is pretty consistent across files to you
will have to get used to those C level macros if you want to modify code at
that level.  The R Extensions manual may prove helpful.

Lastly, as a matter of style, I find people are more likely to help you if
you don't hit them first with a two-by-four of 'your code and style is horrid'.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.



More information about the R-devel mailing list