[R] Tests on contingency tables

Tue Feb 15 15:08:55 CET 2005

I forgot to mention a crucial statistical point:

 	As you are doing this pairwise, remember Simpson's paradox.

or if you don't know about it, Google it (it is not really due to 
Simpson, an instance of Stigler's Law of Eponymy).

On Tue, 15 Feb 2005, Prof Brian Ripley wrote:

> You can test independence via a log-linear model.  More importantly, you can 
> model that dependence and learn something useful about the data.
>
> I don't see your point here: the two factors are clearly highly dependent: 
> who cares what the exact p value is?   Did you do e.g. a mosaicplot as I 
> suspect the dependence is obvious in any reasonable plot?
>
> On Tue, 15 Feb 2005, Jacques VESLOT wrote:
>
>> Dear all,
>> 
>> I have a dataset with qualitative variables (factors) and I want to test 
>> the
>> null hypothesis of independance between two variables for each pair by 
>> using
>> appropriate tests on contingency tables.
>> 
>> I first applied chisq.test and obtained dependance in almost all cases with
>> extremely small p-values and warning messages.
>> 
>>> chisq.test(table(data$ins.f, data$ins.st))$p.val
>> [1] 4.811263e-100
>> Warning message:
>> Chi-squared approximation may be incorrect in: chisq.test(table(data$ins.f,
>> data$ins.st))
>> 
>> I then turned to Fisher's Exact Test for Count Data, but I got only error
>> messages such as:
>> 
>> Error in fisher.test(table(data$ins.f, data$ins.st)) :
>>        FEXACT error 501.
>> The hash table key cannot be computed because the largest key
>> is larger than the largest representable int.
>> The algorithm cannot proceed.
>> Reduce the workspace size or use another algorithm.
>> 
>> maybe cause the dimensions of contingency tables are too large (?).
>
> The help file does says
>
>     Note this fails (with an error message) when the entries of the table
>     are too large.
>
> Note, the _entries_, not the dimensions.  The issue is how many tables need 
> to be enumerated.
>
>>> dim(table(data$ins.f, data$ins.st))
>> [1] 10  8
>> 
>> I then tried likelihood-ratio G-statistic on contingency table (g.stats()
>> from hierfstat package), as follows:
>> 
>>> g.stats(data.frame(as.numeric(data$ins.f),as.numeric(data$ins.s)))$g.stats
>> [1] 486.1993
>> 
>> and I replaced in Chi2 distribution function to get p-value:
>> 
>>> 1-pchisq(486.199, df=63)
>> [1] 0
>> 
>> 
>> Is there a better way to perform this or a more appropriate function
>> dedicated to tests on large-dimensioned contingency tables ?
>
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595