[R] Spearman's Correlation Coefficient to compare distributions?

Ralf B ralf.bierig at gmail.com
Thu Jul 29 09:00:03 CEST 2010


Hi,

I have distributions from two different data sets and I would like to
measure how similar their distributions (in terms of their bin
frequencies) are. In other words, I am not interested in the exact
sequence of data points but rather in the their distributional
properties and in their similarities.
Spearman's Correlation Coefficient is used to compare data without the
assumption of normality. I wonder if this measure can also be used to
compare distributional data rather than the data poitns that are
summarized in a distribution. Here the example code that exemplifies
what I would like to check:

aNorm <- rnorm(1000000)
bNorm <- rnorm(1000000)
cUni <- runif(1000000)
ha <- hist(aNorm)
hb <- hist(bNorm)
hc <- hist(cUni)
print(ha$counts)
print(hb$counts)
print(hc$counts)
# relatively similar
n <- min(c(NROW(ha$counts),NROW(hb$counts)))
cor.test(ha$counts[1:n], hb$counts[1:n], method="spearman")
# quite different
n <- min(c(NROW(ha$counts),NROW(hc$counts)))
cor.test(ha$counts[1:n], hc$counts[1:n], method="spearman")

Does this make sense or am I violating some assumptions of the coefficient?

Thanks,
R.



More information about the R-help mailing list