[Rd] Bug in wilcox.test

Charles Geyer charlie at stat.umn.edu
Sun Jan 29 18:23:18 CET 2006


There is a fairly new bug in wilcox.test in R-2.2.1 (stable).
It wasn't there when I last taught nonparametrics in fall 2003.

Line 86 of wilcox.test.R

    achieved.alpha<-2*psignrank(trunc(qu),n)

It should be

    achieved.alpha<-2*psignrank(trunc(qu)-1,n)

If you don't see why, decode the cookbook instructions p. 56 in
Hollander and Wolfe (2nd ed.) or see

    http://www.stat.umn.edu/geyer/5601/examp/signrank.html#conf

or just do a sanity check: does this to the right thing when the confidence
interval is the range of the data, case qu = 1?  No.

Of course, this error isn't very visible, because wilcox.test still
prints the ASKED FOR confidence level instead of the ACTUAL ACHIEVED
confidence level (which sucks IMHO, but never mind) except when
it incorrectly thinks that the level cannot be achieved, in which
case it prints the incorrect achieved level.  Just great.

To see the bug do

   X <- read.table(url("http://www.stat.umn.edu/geyer/5601/hwdata/t3-3.txt"),
       header = TRUE)
   attach(X)
   wilcox.test(y, x, paired = TRUE, conf.int = TRUE)

and compare with what you get when you change t3-1.txt to t3-3.txt in
the Rweb form in

    http://www.stat.umn.edu/geyer/5601/examp/signrank.html#conf

and submit.

Sorry to sound so grumpy about this, but I hate having my
homework solutions explain that R sucks (in this instance).

Better yet, NEVER use wilcox.test.  Always use wilcox.exact in exactRankTests
or fuzzy.signrank.ci in fuzzyRankTests.

   X <- read.table(url("http://www.stat.umn.edu/geyer/5601/hwdata/t3-3.txt"),
       header = TRUE)
   attach(X)
   library(fuzzyRankTests)
   fuzzy.signrank.ci(y - x)

prints

            Wilcoxon signed rank test

    data:  y - x
    95 percent confidence interval:

    Randomized confidence interval is mixture of two intervals

     probability lower end upper end
             0.9       -25       605
             0.1       -15       560

    Corresponding fuzzy confidence interval is one on the narrower
    interval, 0.9 elsewhere on the wider interval, and zero outside the
    wider interval, with values at jumps that are the average of the left
    and right limits

Sorry about the advert.  Couldn't resist the opportunity.

-- 
Charles Geyer
Professor, School of Statistics
University of Minnesota
charlie at stat.umn.edu



More information about the R-devel mailing list