[R] Weird differing results when using the Wilcoxon-test

peter dalgaard pdalgd at gmail.com
Wed Aug 18 15:47:09 CEST 2010


On Aug 18, 2010, at 11:55 AM, Cedric Laczny wrote:

> I was able to trace down the unexpected behavior to the following line
> SIGMA <- sqrt((n.x * n.y/12) * ((n.x + n.y + 1) - 
>                sum(NTIES^3 - NTIES)/((n.x + n.y) * (n.x + n.y - 
>                  1))))
> My calculations of the Z-score for the normal approximation where based on 
> using the standard deviation for ranks _without_ ties. The above formula seems 
> to account for ties and thus, yields a slightly different z-score. However, the 
> data seems to include at most 1 tie (based on rnorm), so it would be the same 
> result as if it contained no tie (1^3 - 1 has the same result as 0^3 - 0, 
> obviously ;) ) and thus I would expect the result to be the same as when using 
> the formula for the standard deviation without ties.

Note the definition of NTIES <- table(r), counting the number of observations tied for a particular rank, so it is all ones if and only if there are NO ties in data. 

(If you are in paper-and-pencil mode, these formulas are fairly easily worked out once you realize that you only need the mean and variance of the rank of a single observation -- the covariances are C(R1,R2) = -1/(N-1) V(R1) because of symmetry and the fact that the sum of all N ranks is fixed.)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list