[Rd] The two chisq.test p values differ when the contingency
table (PR#3896)
Kurt.Hornik at wu-wien.ac.at
Kurt.Hornik at wu-wien.ac.at
Thu Aug 21 21:20:43 MEST 2003
>>>>> dmurdoch writes:
>> Date: Wed, 16 Jul 2003 01:27:25 +0200 (MET DST)
>> From: shitao at ucla.edu
>>> x
>> [,1] [,2]
>> [1,] 149 151
>> [2,] 1 8
>>> c2x<-chisq.test(x, simulate.p.value=T, B=100000)$p.value
>>> for(i in (1:20)){c2x<-c(c2x,chisq.test(x,
>> simulate.p.value=T,B=100000)$p.value)}
>>> c2tx<-chisq.test(t(x), simulate.p.value=T, B=100000)$p.value
>>> for(i in (1:20)){c2tx<-c(c2tx,chisq.test(t(x), simulate.p.value=T,
>> + B=100000)$p.value)}
>>> cbind(c2x,c2tx)
>> c2x c2tx
>> [1,] 0.03711 0.01683
>> [2,] 0.03717 0.01713
> The problem is in ctest/R/chisq.test.R, where the p-value is
> calculated as
> STATISTIC <- sum((x - E) ^ 2 / E)
> PARAMETER <- NA
> PVAL <- sum(tmp$results >= STATISTIC) / B
> Here tmp$results is a collection of simulated chisquare values, but
> because of different rounding, the statistics corresponding to tables
> equal to the observed table are slightly smaller than the value
> calculated in STATISTIC, and effectively the p-value is calcuated as
> PVAL <- sum(tmp$results > STATISTIC) / B
> instead.
> What's the appropriate fix here?
> PVAL <- sum(tmp$results > STATISTIC - .Machine$double.eps^0.5) / B
> works on this example, but is there something better?
Argh. Very interesting ...
I think it works to use
STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))
instead: this starts by summing the big values, and hence if at all
slightly 'underestimates' the real value, which is fine for the
comparisons.
Fix committed to r-devel. Thanks for looking into this.
-k
More information about the R-devel
mailing list