[R] A question on chisq.test

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Tue Jan 8 10:41:48 CET 2008


Rong Jian wrote:
> Dear all,
> I would like to do a goodness-of-fit test on my data to see if they follow a mixture of 2 poisson distributions. I have small numbers for observed values. Most of them <5. The chisq.test  gives warning message: Chi-squared approximation may be incorrect in: chisq.test(x , p = prob). However, the option sim=TURE would suppress the warning message. Does that mean with the option sim=TURE, the result from chisq.test is valid, even though most of the cell counts <5?
>   
Well, they are not invalid for _that_ reason!

However, when you say p=prob, I bet that your "prob" comes from fitting
three parameters to your data, and chisq.test cannot know that, so it
assumes that "prob" was known in advance. This is a problem whether or
not the cell counts are low, but very low expected cell counts can be
problematic for other reasons, so it migh still be a good idea to pool
some cells.

I would consider replacing chisq.test with a parametric bootstrap, in
which you repeatedly simulate from your fitted distribution, refit to
the simulated data, and calculate a chi squared statistic, with suitable
pooling of cells.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list