[R] spearman rank correlation problem

Martin Maechler maechler at stat.math.ethz.ch
Tue Mar 16 10:25:49 CET 2004


>>>>> "William" == William T Morgan <wmorgan at mitre.org>
>>>>>     on 15 Mar 2004 16:37:08 -0500 writes:

    William> Hello R gurus,
    William> I want to calculate the Spearman rho between two ranked lists. I am
    William> getting results with cor.test that differ in comparison to my own
    William> spearman function:

    >> my.spearman
    William> function(l1, l2) {
    William>   if(length(l1) != length(l2)) stop("lists must have same length")
    William>   r1 <- rank(l1)
    William>   r2 <- rank(l2)
    William>   dsq <- sapply(r1-r2,function(x) x^2)
    William>   1 - ((6 * sum(dsq)) / (length(l1) * (length(l1)^2 - 1)))
    William> }

    William> Perhaps I'm doing something wrong in that code, but it's a pretty
    William> straightforward calculation, so it's hard to see what, especially with
    William> rank() handling the ties correctly. 

Well, the "ties" in your example are really the "problem".
The formula you use,  
    1 - 6 S(d^2) / (n^3 - n)    ( d = r1 - r2 ; r{12} := rank(x{12}) )
is only equal to the more natural definition,  
cor(r1, r2),  in the situation when there are no ties
[plus in a few "lucky" situations with ties].

cor.test() and now  cor(*, method = "spearman")  in R have always used
the correlation of the ranks.
It seems that this needs to be documented, since you are right,
the "1 - 6 S / (..)"  formula is also in use as *definition* for
Spearman's rank correlation.

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><




More information about the R-help mailing list