[R] R's Spearman

Mendiburu, Felipe (CIP) F.MENDIBURU at CGIAR.ORG
Thu May 31 22:06:23 CEST 2007


Dear Ray,

The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R. 

Regards,

Felipe de Mendiburu
Statistician


# Spearman correlation "rs" with ties or no ties
rs<-function(x,y) {
d<-rank(x)-rank(y)
tx<-as.numeric(table(x))
ty<-as.numeric(table(y))
Lx<-sum((tx^3-tx)/12)
Ly<-sum((ty^3-ty)/12)
N<-length(x)
SX2<- (N^3-N)/12 - Lx
SY2<- (N^3-N)/12 - Ly
rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2))
return(rs)
}

# Aplicacion
> cor(y[,1],y[,2],method="spearman")
[1] 0.2319084
> rs(y[,1],y[,2])
[1] 0.2319084



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Raymond Wan
Sent: Monday, May 28, 2007 10:29 PM
To: r-help at stat.math.ethz.ch
Subject: [R] R's Spearman



Hi all,

I am trying to figure out the formula used by R's Spearman rho (using 
cor(method="spearman")) because I can't seem to get the same value as by 
calculating "by hand".  Perhaps I'm using "cor" wrong, but I don't know 
where.  Basically, I am running these commands:

 > y=read.table(file="tmp",header=TRUE,sep="\t")
 > y
   IQ Hours
1 106     7
2  86     0
3  97    20
4 113    12
5 120    12
6 110    17
 > cor(y[1],y[2],method="spearman")
       Hours
IQ 0.2319084

[it's an abbreviated example of one I took from Wikipedia].  I 
calculated by hand (apologies if the table looks strange when pasted 
into e-mail):

      IQ    Hours    rank(IQ)  rank(hours)    diff    diff^2
1    106    7        3         2                 1    1
2     86    0        1         1                 0    0
3     97    20       2         6                -4    16
4    113    12       5         3.5             1.5    2.25
5    120    12       6         3.5             2.5    6.25
6    110    17       4         5                -1    1
                                                      26.5
                           
                                              rho=    0.242857

where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))).  I kept modifying the 
table and realized that the difference in result comes from ties.  i.e., 
if I remove the tie in rows 4 and 5, I get the same result from both cor 
and calculating by hand.  Perhaps I'm handling ties wrong...does anyone 
know how R does it or perhaps I need to change how I'm using it?

Thank you!

Ray

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list