[R] strange fisher.test result

Tue Apr 3 00:54:24 CEST 2007

On 31-Mar-07 13:36:04, Williams Scott wrote:
> A simple question - using the following fishers test it appears that
> the P value is significant, but the CI includes 1. Is this result
> correct?
> 
>> data.50p10min <- matrix(c(16,15, 8, 24),nrow=2)
> 
>> fisher.test(data.50p10min)
> 
>         Fisher's Exact Test for Count Data
> data:  data.50p10min 
> p-value = 0.03941
> 
> alternative hypothesis: true odds ratio is not equal to 1 
> 
> 95 percent confidence interval:
>   0.9810441 10.7771597 
> 
> sample estimates:
> odds ratio 
>   3.138456 

Apologies to Scott for overlooking, in my previous response,
that he had in fact given his data in the line

  data.50p10min <- matrix(c(16,15, 8, 24),nrow=2)

(and to Rolf Turner for pointing this out to me nicely, in
a private mail).

I shall denote this by X0 <- matrix(c(16,15, 8, 24),nrow=2)

I've done a bit of investigating with the help of R, to try
to provide a well-formed explanation of reasons underlying
Scott's observation and his question.

>From the above, the marginal totals for his 2x2 table

  a   b   =   16    8

  c   d       15   24

are (rows then columns) 24,39,31,32

These fixed marginals mean that the whole table is determined
by the value of a. The following function P.FX() computes the
probabilities of all possible tables, conditional on the
marginal totals (it is much more transparent than the code
for the same purpose in fisher.test()):

P.FX <- function(margs,R){
  n  <- margs[1]+margs[2]
  a  <- (0:n)
  b  <- margs[1]-a
  c  <- margs[3]-a
  d  <- n-a-b-c
  ix <- (a>=0)&(b>=0)&(c>=0)&(d>=0)
  a <- a[ix]; b <- b[ix]; c <-c[ix]; d <- d[ix]

  P <- (R^a)/(exp(lgamma(a+1))*exp(lgamma(b+1))*
              exp(lgamma(c+1))*exp(lgamma(d+1)))
  P <- P/sum(P)
  return(cbind(a,b,c,d,P))
}

where 'margs' is the set of marginal totals (in the same row-col
order as above), and R = (alpha*delta)/(beta*gamma) is the
odds-ratio of the cell probabilities

  alpha   beta

  gamma   delta

Hence the probabilties for the Null Hypothesis R=1 are

> P <- P.FX(c(24,39,31,32),1)
> P
       a  b  c  d            P
 [1,]  0 24 31  8 6.714279e-11
 [2,]  1 23 30  9 5.550471e-09
 [3,]  2 22 29 10 1.914912e-07
 [4,]  3 21 28 11 3.702164e-06
 [5,]  4 20 27 12 4.535151e-05
 [6,]  5 19 26 13 3.767664e-04
 [7,]  6 18 25 14 2.215745e-03
 [8,]  7 17 24 15 9.496050e-03
 [9,]  8 16 23 16 3.026866e-02
[10,]  9 15 22 17 7.280305e-02
[11,] 10 14 21 18 1.334723e-01
[12,] 11 13 20 19 1.877552e-01
[13,] 12 12 19 20 2.034015e-01
[14,] 13 11 18 21 1.698738e-01
[15,] 14 10 17 22 1.092046e-01
[16,] 15  9 16 23 5.381095e-02
[17,] 16  8 15 24 2.017911e-02
[18,] 17  7 14 25 5.697630e-03
[19,] 18  6 13 26 1.193093e-03
[20,] 19  5 12 27 1.814060e-04
[21,] 20  4 11 28 1.943636e-05
[22,] 21  3 10 29 1.404269e-06
[23,] 22  2  9 30 6.383041e-08
[24,] 23  1  8 31 1.611427e-09
[25,] 24  0  7 32 1.678570e-11

Extract a and P by

a <- P[,1]; p <- P[,5]

Reminder: The 2-sided Fisher test p-value is
fisher.test(X0)$p.value
[1] 0.03940996

Looking at the code for fisher.test() (which takes some thought!),
the values of 'a' are (effectively) considered in order of
decreasing probability, and added in until the observed value
of 'a' is about to be included; the resulting p-value is the sum
of the probabilities of the 'a' values not included (amongst
which is the observed value itself):

iy <- order(p,decreasing=TRUE)
cbind(a=a[iy],p=p[iy],pval=rev(cumsum(rev(p[iy]))))

         a            p         pval
 [1,]   12 2.034015e-01 1.000000e+00
 [2,]   11 1.877552e-01 7.965985e-01
 [3,]   13 1.698738e-01 6.088432e-01
 [4,]   10 1.334723e-01 4.389695e-01
 [5,]   14 1.092046e-01 3.054972e-01
 [6,]    9 7.280305e-02 1.962926e-01
 [7,]   15 5.381095e-02 1.234896e-01
 [8,]    8 3.026866e-02 6.967862e-02
 [9,]   16 2.017911e-02 3.940996e-02 ###### Observed value
[10,]    7 9.496050e-03 1.923085e-02
[11,]   17 5.697630e-03 9.734798e-03
[12,]    6 2.215745e-03 4.037168e-03
[13,]   18 1.193093e-03 1.821423e-03
[14,]    5 3.767664e-04 6.283293e-04
[15,]   19 1.814060e-04 2.515629e-04
[16,]    4 4.535151e-05 7.015687e-05
[17,]   20 1.943636e-05 2.480536e-05
[18,]    3 3.702164e-06 5.369000e-06
[19,]   21 1.404269e-06 1.666837e-06
[20,]    2 1.914912e-07 2.625675e-07
[21,]   22 6.383041e-08 7.107624e-08
[22,]    1 5.550471e-09 7.245826e-09
[23,]   23 1.611427e-09 1.695355e-09
[24,]    0 6.714279e-11 8.392849e-11
[25,]   24 1.678570e-11 1.678570e-11

Here it can be seen that, with "extreme" meaning "towards the
bottom of the above listing", the probability of a result at
least as extreme as the observed value a=16 is 3.940996e-02,
the value actually returned by fisher.test() (see above).
The right-hand column gives the "upwards" cumulative sum of
the probabilities in the middle column.

Now for the 95% 2-sided Confidence Interval. Finding this is a
somewhat different matter, since the above 2-sided approach
involves ordering the outcomes "2-sidedly", whereas finding the
confidence limits involves a 1-sided approach for each limit.

For the upper limit of the 95% CI, we seek the smallest value
of the odds-ratio parameter such that the probability of an
'a' value less than or equal to the observed a=16 is at most
(1 - 0.95)/2 = 0.025. Even by hand, one is led quite quickly
to the result, varying the value of R on the lines of:

sum(P.FX(c(24,39,31,32),  1)[a<=16,5]) - 0.025
[1] 0.967907
sum(P.FX(c(24,39,31,32),  5)[a<=16,5]) - 0.025
[1] 0.2480870
sum(P.FX(c(24,39,31,32), 10)[a<=16,5]) - 0.025
[1] 0.008515804
.....
sum(P.FX(c(24,39,31,32), 10.75)[a<=16,5]) - 0.025
[1] 0.0002679926
.....
sum(P.FX(c(24,39,31,32), 10.77877506441498)[a<=16,5]) - 0.025
[1] 3.122502e-17

>From fisher.test(X0):
95 percent confidence interval:
  0.9810441 10.7771597 

Whereas, from the function P.FX():
sum(P.FX(c(24,39,31,32),10.7771597)[a<=16,5])
[1] 0.02501496

so there's a slight discrepancy here...

Now, for the upper limit of the 95% CI, we seek the largest
value of the odds-ratio parameter such that the probability
of an 'a' value greater than or equal to the observed a=16
is at most (1 - 0.95)/2 = 0.025. By a similar procedure:

sum(P.FX(c(24,39,31,32), 1.0)[a>=16,5]) - 0.025
[1] 0.002272143
> sum(P.FX(c(24,39,31,32), 0.95)[a>=16,5]) - 0.025
[1] -0.003457482
> sum(P.FX(c(24,39,31,32), 0.98)[a>=16,5]) - 0.025
[1] -0.0001202556
.....
sum(P.FX(c(24,39,31,32), 0.981032895344028)[a>=16,5]) - 0.025
[1] -4.510281e-17

which again is not quite equal to the "0.9810441" from the
fisher.test() for which, again:

sum(P.FX(c(24,39,31,32),0.9810441)[a>=16,5])
[1] 0.02500131

so again a slight discrepancy. (By the way, the above manual
iteration is a lot quicker than one might at first expect;
each of the above convergences took definitely less than
minutes, and there is something rewarding about observing
the convergence itself!)

I'm not sure what the reason for these slight discrepancies is.
Maybe there's a small difference in numerical result between
the functions dnhyper() and pnhyper() (defined internally in
fisher.test()) and the results from my function P.FX() above;
maybe there's s difference in "stopping rule" between the use
of uniroot() in fisher.test(), and my manual method above.

Anyway, to come back to Scott's original query, it is clear
from the above analysis that the way fisher.test() arrives at
its 2-sided p-value is not directly related to the way it
arrives at the upper and lower confidence limits.

For the p-value, it works outwards in both directions on
the value of 'a', until the observed value is just excluded,
using only the Null Hypothesis value of R, the odds ratio.
Then the p-value is the sum of the probabilities of the 
'a' values excluded on either side. The separate probabilties
on each side are not necessarily (and generally are not) equal.
Here the contribution from one side (in this case the lower
values of 'a') is smaller than the contribution from the
other side (the higher values of a), as one can see from the
one-sided sums for odds-ratio=1 for the two-sided test:

sum(P.FX(c(24,39,31,2), 1)[a<=7,5])
[1] 0.01213781

sum(P.FX(c(24,39,31,2), 1)[a>=16,5])
[1] 0.02727214

For the confidence limits, the p-value for each limit is
fixed to be exactly 0.025, and the odds-ratio is varied
until the sum of the ">=a (=16)" probabilities (for the
lower limit) is exactly 0.025, and again varied until the
sum of the "<=a (=16)" probabilites (for the upper limit)
is again exactly 0.025.

So it is not, in fact, a paradox that the p-value for the
null hypothesis OR=1 is less than 0.05, while the 95%
confidence interval for the OR includes the value 1.

This is not, of course, going to happen in most cases.
It is associated with the fact that, in the present case,
the p-value for the test is not much below 0.05.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Apr-07                                       Time: 23:54:20
------------------------------ XFMail ------------------------------