[R] Tests on contingency tables

Tue Feb 15 15:57:08 CET 2005

Thanks a lot for your help !

Right ! According to tables, most factors look indeed highly dependent...
but, because of strange p-values and warning messages, as I tried to test it
with Chisquare test, and because Fisher's Exact Test function doesn't work
on my data, I wondered whether there were other functions to perform such
tests.

I will try with test independence via a log-linear model.
Is this code correct ? (I can't catch exactly how to put 'formula' argument)

> z <- table(data$fac1, data$fac2)
> names(dimnames(z)) <- c("fac1", "fac2")
> fm <- loglm(~ins+pl,z)
> fm
Call:
loglm(formula = ~ins + pl, data = z)

Statistics:
                      X^2 df P(> X^2)
Likelihood Ratio 286.1236 49        0
Pearson          450.5332 49        0

Jacques

-----Message d'origine-----
De : Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Envoye : mardi 15 fevrier 2005 17:41
A : Jacques VESLOT
Cc : R-HELP; jerome.goudet at unil.ch
Objet : Re: [R] Tests on contingency tables

You can test independence via a log-linear model.  More importantly, you
can model that dependence and learn something useful about the data.

I don't see your point here: the two factors are clearly highly dependent:
who cares what the exact p value is?   Did you do e.g. a mosaicplot as I
suspect the dependence is obvious in any reasonable plot?

On Tue, 15 Feb 2005, Jacques VESLOT wrote:

> Dear all,
>
> I have a dataset with qualitative variables (factors) and I want to test
the
> null hypothesis of independance between two variables for each pair by
using
> appropriate tests on contingency tables.
>
> I first applied chisq.test and obtained dependance in almost all cases
with
> extremely small p-values and warning messages.
>
>> chisq.test(table(data$ins.f, data$ins.st))$p.val
> [1] 4.811263e-100
> Warning message:
> Chi-squared approximation may be incorrect in:
chisq.test(table(data$ins.f,
> data$ins.st))
>
> I then turned to Fisher's Exact Test for Count Data, but I got only error
> messages such as:
>
> Error in fisher.test(table(data$ins.f, data$ins.st)) :
>        FEXACT error 501.
> The hash table key cannot be computed because the largest key
> is larger than the largest representable int.
> The algorithm cannot proceed.
> Reduce the workspace size or use another algorithm.
>
> maybe cause the dimensions of contingency tables are too large (?).

The help file does says

      Note this fails (with an error message) when the entries of the table
      are too large.

Note, the _entries_, not the dimensions.  The issue is how many tables
need to be enumerated.

>> dim(table(data$ins.f, data$ins.st))
> [1] 10  8
>
> I then tried likelihood-ratio G-statistic on contingency table (g.stats()
> from hierfstat package), as follows:
>
>>
g.stats(data.frame(as.numeric(data$ins.f),as.numeric(data$ins.s)))$g.stats
> [1] 486.1993
>
> and I replaced in Chi2 distribution function to get p-value:
>
>> 1-pchisq(486.199, df=63)
> [1] 0
>
>
> Is there a better way to perform this or a more appropriate function
> dedicated to tests on large-dimensioned contingency tables ?

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595