[R] Test statistic for Spearman correlation

Martin Maechler maechler at stat.math.ethz.ch
Thu May 1 21:54:58 CEST 2003


>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
>>>>>     on 01 May 2003 19:20:04 +0200 writes:

    PD> Thomas W Blackwell <tblackw at umich.edu> writes:
    >> Brett -
    >> 
    >> I can give you a further reference, but you may not find
    >> it much help !
    >> 
    >> E. G. Olds.  Distribution of sums of squares of rank
    >> differences for small numbers of individuals.  Annals of
    >> Mathematical Statistics, v.9, pp. 133-148, 1938.
    >> 
    >> My source says that "Olds (1938) tabulated the exact
    >> distribution of a quantity S related to rho by the
    >> equation
    >> 
    >> R = 1 - 6 * S / (n^3 - n) ."
    >> 
    >> Olds must have been using a Comptometer or a Marchant
    >> calculator, so presumably, this construct guarantees
    >> always to be an integer.  Algorithm AS 89 is certainly
    >> available on line from Statlib.

    PD> The title of Olds paper might have given you a hint:

    >> x <- rank(rnorm(10)) y <- rank(rnorm(10)) cor(x,y)
    PD> [1] -0.2242424
    >> 990/6*(1-cor(x,y))
    PD> [1] 202
    >> sum((x-y)^2)
    PD> [1] 202

    PD> BTW, the identity breaks down when there are ties,
    PD> something that we probably ought to look into at some
    PD> point. The code does say that the p values may be
    PD> incorrect, but I suspect they may be more incorrect than
    PD> need be.

Yes, I'm quite sure of this (both/all statements).

Note that I still have uncommitted fixes to the problem large n.
For the "proper" fix, I did get interested, and during the last weeks have spent quite
some time reading several of the original papers on these.
I also found that there now are much better (i.e. faster)
methods available for exact calculation of P-values.

Currently I plan for 1.7.1 to have an improvement here, and for
1.8.0 to have more.

Martin



More information about the R-help mailing list