[Rd] Possible bug in fisher.test() (PR#14196)

(Ted Harding) Ted.Harding at manchester.ac.uk
Wed Jan 27 19:14:59 CET 2010


On 27-Jan-10 17:30:10, nhorton at smith.edu wrote:
># is there a bug in the calculation of the odds ratio in fisher.test?
># Nicholas Horton, nhorton at smith.edu Fri Jan 22 08:29:07 EST 2010
> 
> x1 = c(rep(0, 244), rep(1, 209))
> x2 = c(rep(0, 177), rep(1, 67), rep(0, 169), rep(1, 40))
> 
> or1 = sum(x1==1&x2==1)*sum(x1==0&x2==0)/
>      (sum(x1==1&x2==0)*sum(x1==0&x2==1))
> 
> library(epitools)
> or2 = oddsratio.wald(x1, x2)$measure[2,1]
> 
> or3 = fisher.test(x1, x2)$estimate
> 
># or1=or2 = 0.625276, but or3=0.6259267!
> 
> I'm running R 2.10.1 under Mac OS X 10.6.2.
> Nick

Not so. Look closely at ?fisher.test:

Value:
[...]
estimate: an estimate of the odds ratio.  Note that the
          _conditional_ Maximum Likelihood Estimate (MLE)
          rather than the unconditional MLE (the sample
          odds ratio) is used. Only present in the 2 by 2 case.

Your or1 (and presumably the epitools value also) is the sample OR.

The conditional MLE is the value of rho (the OR) that maximises
the probability of the table *conditional* on the margins.

In this case it differs slightly from the sample OR (by 0.1%).
For smaller tables it will tend to differ even more, e.g.

  M1 <- matrix(c(4,7,17,18),nrow=2)
  M1
  #      [,1] [,2]
  # [1,]    4   17
  # [2,]    7   18

  (4*18)/(17*7)
  # [1] 0.605042

  fisher.test(M1)$estimate
  # odds ratio 
  #     0.6116235  ## (1.1% larger than sample OR)

  M2 <- matrix(c(1,2,4,5),nrow=2)
  M2
  #      [,1] [,2]
  # [1,]    1    4
  # [2,]    2    5

  (1*5)/(4*2)
  # [1] 0.625

  fisher.test(M2)$estimate
  # odds ratio 
  #     0.649423  ## (3.9% larger than sample OR)

The probability of a table matrix(c(a,b,c,d),nrow=2) given
the marginals (a+b),(a+c),(b+c) and hence also (c+d) is
a function of the odds ratio only. Again see ?fisher.test:

  "given all marginal totals fixed, the first element of
   the contingency table has a non-central hypergeometric
   distribution with non-centrality parameter given by
   the odds ratio (Fisher, 1935)."

The value of the odds ratio which maximises this (for given
observed 'a') is not the sample OR.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Jan-10                                       Time: 18:14:57
------------------------------ XFMail ------------------------------



More information about the R-devel mailing list