# [R] R's Spearman

Mendiburu, Felipe (CIP) F.MENDIBURU at CGIAR.ORG
Thu May 31 22:06:23 CEST 2007

```Dear Ray,

The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R.

Regards,

Felipe de Mendiburu
Statistician

# Spearman correlation "rs" with ties or no ties
rs<-function(x,y) {
d<-rank(x)-rank(y)
tx<-as.numeric(table(x))
ty<-as.numeric(table(y))
Lx<-sum((tx^3-tx)/12)
Ly<-sum((ty^3-ty)/12)
N<-length(x)
SX2<- (N^3-N)/12 - Lx
SY2<- (N^3-N)/12 - Ly
rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2))
return(rs)
}

# Aplicacion
> cor(y[,1],y[,2],method="spearman")
[1] 0.2319084
> rs(y[,1],y[,2])
[1] 0.2319084

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Raymond Wan
Sent: Monday, May 28, 2007 10:29 PM
To: r-help at stat.math.ethz.ch
Subject: [R] R's Spearman

Hi all,

I am trying to figure out the formula used by R's Spearman rho (using
cor(method="spearman")) because I can't seem to get the same value as by
calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know
where.  Basically, I am running these commands:

> y
IQ Hours
1 106     7
2  86     0
3  97    20
4 113    12
5 120    12
6 110    17
> cor(y[1],y[2],method="spearman")
Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia].  I
calculated by hand (apologies if the table looks strange when pasted
into e-mail):

IQ    Hours    rank(IQ)  rank(hours)    diff    diff^2
1    106    7        3         2                 1    1
2     86    0        1         1                 0    0
3     97    20       2         6                -4    16
4    113    12       5         3.5             1.5    2.25
5    120    12       6         3.5             2.5    6.25
6    110    17       4         5                -1    1
26.5

rho=    0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the
table and realized that the difference in result comes from ties.  i.e.,
if I remove the tie in rows 4 and 5, I get the same result from both cor
and calculating by hand.  Perhaps I'm handling ties wrong...does anyone
know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help