[R] Is it safe? Cochran etc

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Sat Oct 9 19:18:09 CEST 2004


I have the following contingency table

dat <- matrix(c(1,506,13714,878702),nr=2)

And I want to test if their is an association between events 

A:{a,not(a)} and B:{b,not(b)}

        | b   | not(b) |
--------+-----+--------+
 a      |   1 |  13714 |
--------+-----+--------+
 not(a) | 506 | 878702 |
--------+-----+--------+

I am worried that prop.test and chisq.test are not valid given the low
counts and low probabilites associated with 'sucess' in each category.

Is it safe to use them, and what is the alternative? (given that
fisher.test can't handle this data... hold the phone...

I just found fisher.test can handle this data if the test is one-tailed
and not two-tailed.

I don't understand the difference between chisq.test, prop.test and
fisher.test when the hybrid=1 option is used for the fisher.test.

I was using the binomial distribution to test the 'extremity' of the
observed data, but now I think I know why that is inapropriate, however,
with the binomial (and its approximation) at least I know what I am
doing. And I can do it in perl easily...

Generally, how should I calculate fisher.test in perl (i.e. what are its
principles). When is it safe to approximate fisher to chisq?

I cannot get insight into this problem...

How come if I do...

dat <- matrix(c(50,60,100,100),nr=2)

prop.test(dat)$p.value
chisq.test(dat)$p.value
fisher.test(dat)$p.value

I get 

[1] 0.5173269
[1] 0.5173269
[1] 0.4771358

When I looked at the binomial distribution and the normal approximation
thereof with similar counts I never had a p-value difference > 0.004

I am so fed up with this problem :(




More information about the R-help mailing list