[R] Tests on contingency tables

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Feb 15 14:40:57 CET 2005


You can test independence via a log-linear model.  More importantly, you 
can model that dependence and learn something useful about the data.

I don't see your point here: the two factors are clearly highly dependent: 
who cares what the exact p value is?   Did you do e.g. a mosaicplot as I 
suspect the dependence is obvious in any reasonable plot?

On Tue, 15 Feb 2005, Jacques VESLOT wrote:

> Dear all,
>
> I have a dataset with qualitative variables (factors) and I want to test the
> null hypothesis of independance between two variables for each pair by using
> appropriate tests on contingency tables.
>
> I first applied chisq.test and obtained dependance in almost all cases with
> extremely small p-values and warning messages.
>
>> chisq.test(table(data$ins.f, data$ins.st))$p.val
> [1] 4.811263e-100
> Warning message:
> Chi-squared approximation may be incorrect in: chisq.test(table(data$ins.f,
> data$ins.st))
>
> I then turned to Fisher's Exact Test for Count Data, but I got only error
> messages such as:
>
> Error in fisher.test(table(data$ins.f, data$ins.st)) :
>        FEXACT error 501.
> The hash table key cannot be computed because the largest key
> is larger than the largest representable int.
> The algorithm cannot proceed.
> Reduce the workspace size or use another algorithm.
>
> maybe cause the dimensions of contingency tables are too large (?).

The help file does says

      Note this fails (with an error message) when the entries of the table
      are too large.

Note, the _entries_, not the dimensions.  The issue is how many tables 
need to be enumerated.

>> dim(table(data$ins.f, data$ins.st))
> [1] 10  8
>
> I then tried likelihood-ratio G-statistic on contingency table (g.stats()
> from hierfstat package), as follows:
>
>> g.stats(data.frame(as.numeric(data$ins.f),as.numeric(data$ins.s)))$g.stats
> [1] 486.1993
>
> and I replaced in Chi2 distribution function to get p-value:
>
>> 1-pchisq(486.199, df=63)
> [1] 0
>
>
> Is there a better way to perform this or a more appropriate function
> dedicated to tests on large-dimensioned contingency tables ?


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list