[Rd] ctest package: wilcox.test() produces integer overflow (PR#2453)

Kurt Hornik Kurt.Hornik@wu-wien.ac.at
Tue Jan 14 18:48:08 2003


>>>>> bates  writes:

> This was filed as a bug report on the Debian r-base package.  It is
> more properly a bug report on the ctest package in R.  

> The default method for wilcox.test manipulates x and y without
> checking the class or data.class of these objects.  Possible solutions
> are
>  - create wilcox.test.factor (if appropriate)
>  - check the class and/or data.class of x and y in wilcox.test.default
>    and produce error messages or warnings for inappropriate objects
>  - coerce to numeric unconditionally (probably not a good idea)

Hmm, but the documentation clearly says

  \item{x}{numeric vector of data values.}
  \item{y}{an optional numeric vector of data values.}

-k

> Martin Michlmayr <tbm@cyrius.com> writes:

>> Package: r-base
>> Version: 1.5.0-2 / 1.6.1.cvs.20030103-1
>> Severity: normal
>> 
>> I have some ordinal data and I wanted to perform an u-test.  However,
>> a problem occured:
>> 
>> > x <- read.table("spss-3.txt", header=TRUE)
>> > a = factor(x$a)
>> > b = factor(x$b)
>> > summary(a)
>> 1     2     3     4     5     6 
>> 23900 20362 15238 10007  3399   472 
>> > summary(b)
>> 1     2     3     4     5     6 
>> 23809 20649 15069  9952  3415   484 
>> > wilcox.test(a, b)
>> 
>> Wilcoxon rank sum test with continuity correction
>> 
>> data:  a and b 
>> W = 5384330884, p-value = NA
>> alternative hypothesis: true mu is not equal to 0 
>> 
>> Warning messages: 
>> 1: "-" not meaningful for factors in: Ops.factor(x, mu) 
>> 2: NAs produced by integer overflow in: n.x * n.y 
>> 3: NAs produced by integer overflow in: n.x * n.y 
>> > 
>> 
>> Now there appear to be two issues: First of all, the NAs produced by
>> integer overflow.  Since they go away when I use less data, this looks
>> like an R bug with big data sets.  When I use less data, the warning
>> goes away:
>> 
>> 57:tbm@arborlon: ~] wc -l s
>> 40000 s
>> 
>> > summary(a)
>> 1     2     3     4     5     6 
>> 13034 11086  8341  5412  1869   257 
>> > summary(b)
>> 1     2     3     4     5     6 
>> 13034 11086  8341  5412  1869   257 
>> > wilcox.test(a, b)
>> 
>> Wilcoxon rank sum test with continuity correction
>> 
>> data:  a and b 
>> W = 1599920001, p-value = < 2.2e-16
>> alternative hypothesis: true mu is not equal to 0 
>> 
>> Warning message: 
>> "-" not meaningful for factors in: Ops.factor(x, mu) 
>> > 
>> 
>> 
>> However, I still don't know what the other warning is.  I dont have an
>> "-" in my data.  I reduced the data to 2 lines and the problem still
>> occurs:
>> 
>> > summary(a)
>> 2 3 
>> 1 1 
>> > summary(b)
>> 2 3 
>> 1 1 
>> > wilcox.test(a, b)
>> 
>> Wilcoxon rank sum test
>> 
>> data:  a and b 
>> W = 4, p-value = 0.3333
>> alternative hypothesis: true mu is not equal to 0 
>> 
>> Warning message: 
>> "-" not meaningful for factors in: Ops.factor(x, mu) 
>> > 
>> 
>> The file is:
>> 
>> 67:tbm@arborlon: ~] cat s
>> a   b
>> 2   4
>> 3   1
>> 68:tbm@arborlon: ~] 
>> 
>> 
>> I'm not an R expert, so this might be a pilot error; but I don't see
>> where.
>> 
>> 
>> -- System Information:
>> Debian Release: 3.0
>> Architecture: i386
>> Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686
>> Locale: LANG=en_US, LC_CTYPE=en_US
>> 
>> Versions of packages r-base depends on:
>> ii  r-base-core                   1.5.0-2    GNU R core of statistical computin
>> ii  r-base-html                   1.5.0-2    GNU R html docs for statistical co
>> ii  r-base-latex                  1.5.0-2    GNU R LaTeX docs for statistical c
>> 
>> -- no debconf information
>> 
>> 
>> -- 
>> Martin Michlmayr
>> tbm@cyrius.com
>> 
>> 

> -- 
> Douglas Bates                            bates@stat.wisc.edu
> Statistics Department                    608/262-2598
> University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/

> ______________________________________________
> R-devel@stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-devel