[R] Fishers exact test at < 2.2e-16

Thu Dec 17 15:16:54 CET 2009

Søren Faurby wrote:
> In an effort to select the most appropriate number of clusters in a
> mixture analysis I am comparing the expected and actual membership of
> individuals in various clusters using the Fisher?s exact test. I aim
> for the model with the lowest possible p-value, but I frequently get
> p-values below 2.2e-16 and therefore does not get exact p-values with
> standard Fisher?s exact tests in R.
> 
> Does anybody know if there is a version of Fisher?s exact test in
> any package which can handle lower probabilities, or have other
> suggestions as to how I can compare the probabilities?
> 
> I am for instance comparing the following two:
> 
> dat2<-matrix(c(29,0,29,0,12,0,18,0,0,29,0,16,0,19), nrow=2)
> fisher.test(dat2, workspace=30000000)
> 
> dat3<-matrix(c(29,0,0,29,0,0,12,0,0,17,0,1,0,29,0,0,15,1,0,0,19),
> nrow=3)
> fisher.test(dat3, workspace=30000000)
> 
> Which both result in p-value < 2.2e-16
> 
> Kind regards, Søren

The direct answer is that it is primarily an issue of printing conventions:

> fisher.test(dat2, workspace=30000000)$p.value
[1] 5.384278e-44
> fisher.test(dat3, workspace=30000000)$p.value
[1] 5.883133e-58

However, I'm not sure (a) what is the influence of underflow in the
calculation of such tiny p-values, or (b) whether the p-value is a
sensible metric for comparing clustering models at all.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907