[R] Automating binning for chisq.test()

Duncan Murdoch murdoch at stats.uwo.ca
Fri Oct 12 19:54:05 CEST 2007


On 10/12/2007 1:16 PM, D. R. Evans wrote:
> The standard chisq.test() and fisher.test() functions, when applied to
> two distributions (to determine whether the same underlying
> distribution applies to both) requires one to pre-bin the
> distributions.
> 
> Is there a library function (either built-in or in a package) that
> acts more like the ks.test() function, in that one can simply pass the
> two distributions and have it do the necessary binning as well as the
> actual statistical test?
> 
> (Yes, you can accuse me of laziness: I just don't fancy trying to
> figure out a routine that would make sure that there more than 5
> samples in each of the expected bins before applying the chi-squared
> test. It seems too much like re-inventing an elementary wheel that
> must have been invented by someone else.)

If you have a quantile function q() for the distribution, a sample size 
of N, and want expected counts of 5 in each bin, just calculate the 
cutpoints as

nbins <- floor(N/5)
cutpoints <- c(-Inf, q( (1:(nbins-1)/nbins)), Inf)

Duncan Murdoch



More information about the R-help mailing list