peter dalgaard
pdalgd at gmail.com
Wed Sep 3 23:20:04 CEST 2014
Notice that correct=TRUE for wilcox.test refers to the continuity correction, not the correction for ties.
You can fairly easily simulate from the exact distribution of W:
x <- c(359,359,359,359,359,359,335,359,359,359,359,
359,359,359,359,359,359,359,359,359,359,303,359,359,359)
y <- c(332,85,359,359,359,220,231,300,359,237,359,183,286,
355,250,105,359,359,298,359,359,359,28.6,359,359,128)
R <- rank(c(x,y))
sim <- replicate(1e6,sum(sample(R,25))) - 325
# With no ties, the ranks would be a permutation of 1:51, and we could do
sim2 <- replicate(1e6,sum(sample(1:51,25))) - 325
In either case, the p-value is the probability that W >= 485 or W <= 165, and
> mean(sim >= 485 | sim <= 165)
[1] 0.000151
> mean(sim2 >= 485 | sim2 <= 165)
[1] 0.002182
Also, try
plot(density(sim))
lines(density(sim2))
and notice that the distribution of sim is narrower than that of sim2 (hence the smaller p-value with tie correction), but also that the normal approximationtion is not nearly as good as for the untied case. The "clumpiness" is due to the fact that 35 of the ranks have the maximum value of 34 (corresponding to the original 359's).
-pd
