[Rd] wilcox.test point estimates perverse (PR#1150)

Sat, 27 Oct 2001 12:34:54 +0200 (MEST)

> The point estimates produced by wilcox.test are perverse (not wrong, just
> brain damaged).  The Hodges-Lehmann estimator that goes with the signed
> rank test is the median of the Walsh averages.  The Hodges-Lehmann estimator
> that goes with the rank sum test is the median of the pairwise differences.
> 
> wilcox.test agrees except that it uses the following very peculiar definition
> of "sample median": if the number of items is even, the average of the two
> middle items (agrees with the usual definition), and if the number of items
> is odd, the average of the two on either side of the middle item in sorted
> order (huh???  why???).  I know this is asymptotically equivalent to the
> usual definition, but
> 
>   * Why get answers that disagree with every nonparametrics textbook?
> 
>   * If wilcox.test is right then median is wrong and should be fixed
>     (just kidding, don't mess with median!)
> 
> Thus the complicated code in lines 87--89 of wilcox.test.default
> should be replaced by the simple
> 
>   ESTIMATE <- median(diffs)
> 
> and the complicated code in lines 214-216 of wilcox.test.default should
> be again be replaced replaced by the simple
> 
>   ESTIMATE <- median(diffs)
> 

For tied samples this estimators may be not inside the confidence sets which
is confusing. Example (from ?wilcox.exact, package exactRankTests):

> treat <- c(94, 108, 110, 90)
> contr <- c(80, 94, 85, 90, 90, 90, 108, 94, 78, 105, 88)
> 
> # StatXact 4 for Windows: p.value = 0.0989, point prob = 0.019
> 
> wilcox.exact(contr, treat, conf.int=T)

        Exact Wilcoxon rank sum test

data:  contr and treat 
W = 9, point prob = 0.019, p-value = 0.0989 
alternative hypothesis: true mu is not equal to 0 
95 percent confidence interval:
 -22   4 
sample estimates:
difference in location 
                    13 		<- when computed with median(diffs)

For compatibility between the 2 versions of the Wilcoxon-Test, 
we use the basic definition: 

  d_1 = sup {d | W(d) > E(W) }
  d_2 = inf {d | W(d) < E(W) }

  Hodges-Lehmann = mean(d1,d2)

(using max and min instead of sup and inf which causes the difference).
However, this may be questionable ...

> Moreover, there is NO "correction for ties" in the Hodges-Lehmann estimator.
> Thus the code in lines 147-148 and 272-273 is silly.  The code for the point
> estimate should be done by exactly the same code when there are ties or zeroes
> (or both) and when there are not.  Reference: Sections 3.2 and 4.2 of
> Hollander and Wolfe.

wilcox.test uses the normal approximation when 

a) the sample sizes are large or
b) ties occur.

computing all differences when a) is not feasible (taking it to C does not
improve `outer(x,y,"-")' significantly). Therefore, we
use uniroot for searching d with W(X - d, Y) = E(W).

In case b) in `wilcox.test' the normal
approximation is used for p-values and confidence intervals, it seems
natural to me to compute the point estimator the way the conf ints are
computed. Additionally median(diffs) may lie outside the confidence set
in this situation (see above).

> 
> I understand that one doesn't want to produce the vector diffs (which is
> order n^2 when n is large), but one doesn't have to to calculate the median
> if this is taken to C.
> 
> Moreover that uniroot stuff sometimes crashes (sorry, I didn't save the
> examples, but take my word for it, it's not bulletproof).
> 
> Note that the confidence intervals are also bizarre in the case of ties, but
> that's another bug report.

Because the normal approximation is bizzare for small, tied samples? 
That is what `wilcox.exact' is for. 

Torsten

> -- 
> Charles Geyer
> Professor, School of Statistics
> University of Minnesota
> charlie@stat.umn.edu
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._