[R] Analyzing three-way contingency tables with many zero cells

James Bull james.bull at monash.edu
Tue Jun 14 04:20:06 CEST 2011


Hi all,

I am trying to analyze the following data. The first three columns are 
categorical variables (colors of three traits for a peripatus species) 
and the last column the count of individuals in each three-way 
classification. I wish to test if the three traits vary independently or 
if they are correlated across individuals, i.e. - this is a basic three 
way contingency table question.

Segment,Body,Pattern,Count
1,1,1,91
1,1,2,139
1,1,3,2
1,1,4,195
1,2,1,0
1,2,2,0
1,2,3,0
1,2,4,0
1,3,1,1
1,3,2,1
1,3,3,0
1,3,4,0
2,1,1,5
2,1,2,34
2,1,3,6
2,1,4,80
2,2,1,2
2,2,2,0
2,2,3,0
2,2,4,14
2,3,1,2
2,3,2,3
2,3,3,6
2,3,4,376
3,1,1,1
3,1,2,0
3,1,3,0
3,1,4,0
3,2,1,0
3,2,2,0
3,2,3,0
3,2,4,0
3,3,1,0
3,3,2,1
3,3,3,0
3,3,4,71

I can run the following code, but am unsure if the Log-linear model is 
inappropriate given the large number of zero cells in my matrix. I've 
had a look for permutation based equivalents to avoid this issue, but 
can only find mh_test in the coin package, which I don't think is 
appropriate as none of my factors can be considered to blocking?

# Import data
COR<-read.table('Independence of traits.csv',header=T,sep=',',strip.white=T)

# Transform data into an appropriate table form
COR.tab<-xtabs(Count~Segment+Body+Pattern,COR)

# To run G2 Log-linear model test, not sure if super appropriate as many 
empty cells
COR.glmF <- glm(Count~Segment*Body*Pattern, family=poisson, COR)
anova(COR.glmF, test="Chisq")

Any advice would be greatly appreciated.

Many thanks in advance,

James Bull

(Monash University, Australia)



More information about the R-help mailing list