[R] chisq.test() as a goodness of fit test

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Thu Jan 13 19:30:58 CET 2005


On 13-Jan-05 Vito Ricci wrote:
> Dear R-Users,
> 
> How can I use chisq.test() as a goodness of fit test?
> Reading man-page I've some doubts that kind of test is
> available with this statement. Am I wrong?
> 
> 
> X2=sum((O-E)^2)/E)
> 
> O=empirical frequencies
> E=expected freq. calculated with the model (such as
> normal distribution)
> 
> See:
> http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
> for X2 used as a goodness of fit test.

It is not conspicuous in "?chisqu.test", though in fact it is
the case, that chisq.test() could perform the sort of test you
are looking for. No doubt this is a result of so much space
devoted to the contingency table case.

However, if you use it in the form

  chisq.test(x,p)

where x is a vector of counts in "bins" and p is a vector,
of the same length as x, of the probabilities that a random
observation will fall in the various bins, then it is that
sort of test.

So, for example, if you dissect the range of X into k intervals
[,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],],
let N1, N2, ... , Nk be the numbers of observations in these
intervals,
let

  x = c(N1,...,Nk)

  p = c(pnorm(X1),
        pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]),
        1-pnorm(X[k-1]) )

then

  chisq.test(x,p)

will test the goodness of fit of the normal distribution.
(Note that the above is schematic pseudo-R code, not real
R code!)

However, this use of chisq.test(x,p) is limited (as far
as I can see) to the case where no parameters have been
estimated in choosing the distribution from which p is
calculated, and so will be based on the wrong number
of degrees of freedom if the distribution is estimated
from the data. I cannot see any provision for specifying
either the degrees of freedom, or the number of parameters
estimated for p, in the documentation for chisq.test().

So in the latter case you are better off doing it directly.
This is not more difficult, since the hard work is in
calculating the elements of p. After that, with E=N*p,

  X2 <- sum(((O-E)^2)/E)

has the chi-squared distribution with df=(k-r) d.f. where
k is the number of "bins" and r is the number of parameters
that have been estimated. So get 1-pchisq(X2,df).

Best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 13-Jan-05                                       Time: 18:30:58
------------------------------ XFMail ------------------------------




More information about the R-help mailing list