[R] 2x2 Contingency table with much sampling zeroes

Charles C. Berry cberry at tajo.ucsd.edu
Tue Oct 20 18:09:25 CEST 2009


On Tue, 20 Oct 2009, Etienne Toffin wrote:

> Hi,
>
> I'm analyzing experimental results where two different events ("T1" and "T2") 
> can occur or not during an experiment. I made my experiments with one factor 
> ("Substrate") with two levels ("Sand" and "Clay").
> I would like to know wether or not "Substrate" affects the occurrence 
> probability of the two events.


It is not clear to me what you mean by 'affects the occurence ...'.

This sounds like 'Independence of Substrate from the two other variables', 
which is a 3 degree of freedom hypothesis (at least in the example you 
give).

Is that what you are after or are only some of those contrasts 
interesting?



Moreover, for each condition I would like to 
> test the heterogeneity of my experimental contingency table with a 
> theoretical one (from simulations).
>

Do you mean you have some prior values for the counts or proportions? If 
so a standard goodness of fit test should do. If not, you need to describe 
the problem in more detail.


> However, my problem is that several cells have sampling zeroes. My 
> experiments can't be done again to fill these cells. Thus Chi-square 
> requirements are not fulfilled and I have to find another statistical method.
>

Sampling zeroes in the cells are not a problem as long as the marginal 
tables do not have such zeroes. Depending on the hypotheses you want to 
test, the marginal tables may be OK. 'Substrate' is OK and so is 'T1 by 
T2', so you can do the 3 degree of freedom test implied by those margins.


> After spending hours searching for a solution, I thought I could use 
> loglinear model to answer my questions, but :
> - I'm not sure I can use loglinear model = do I fulfill the required 
> conditions ?


Have you studied the Agresti reference listed in the help page?? I'll bet 
it addresses 'the required conditions' - which go to the sampling 
distribution of the counts.

> - would this method answer to my hypothesis ?
> - I not sure to really understand how I have to use loglin()…
>

run

 	example(loglin)

and reread

 	?loglin

The example is the same setup as you have here (albeit with more degrees 
of freedom), so you might emulate it.


> Here is the data frame of my results.
>
> DF<-data.frame(Subs=c(rep("Sand",4),rep("Clay",4)),T1=rep(c("YES","YES","NO","NO"),2),T2=rep(c("YES","NO","YES","NO"),2),Freq=c(12,5,0,7,24,1,0,0))
>
> What do you think of such datas ? Can I use any statistical method to test my 
> hypothesis ? Any advice ?

Recruit a statistician to your committee. Questions like these are better 
hashed out in front of a blackboard than over the internet.

HTH,

Chuck


>
> Thanks,
>
> Etienne Toffin
>
>
> -------------------------------------------------------------------
> Etienne Toffin, PhD Student
> Unit of Social Ecology
> Université Libre de Bruxelles, CP 231
> Boulevard du Triomphe
> B-1050 Brussels
> Belgium
>
> Tel: +32(0)2/650.55.30
> Fax: +32(0)2/650.57.67
> http://www.ulb.ac.be/sciences/use/toffin.html
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list