[R] Chi-squared test

Fri Nov 25 02:14:22 CET 2005

(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:

> On 24-Nov-05 P Ehlers wrote:
> > Bianca Vieru- Dimulescu wrote:
> >> Hello,
> >> I'm trying to calculate a chi-squared test to see if my data are 
> >> different from the theoretical distribution or not:
> >> 
> >> chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
>                     c(80,80,80,80,80,80,80,80,80,80,80,80)))
> >> 
> >>       Pearson's Chi-squared test
> >> 
> >> data:  rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
> >>              c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
> >> X-squared = 17.6, df = 11, p-value = 0.09142
> >> 
> >> Is this correct? If I'm doing the same thing using Excel I obtained
> >> a different value of p.. (1.65778E-14)
> >> 
> >> Thanks a lot,
> >> Bianca
> > 
> > It would be unusual to have 12 observed frequencies all equal to 80.
> > So I'm guessing that you have a 12-category variable and want to
> > test its fit to a discrete uniform distribution. I assume that your
> > frequencies are
> > 
> > x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
> > 
> > Then just use
> > 
> > chisq.test(x)
> > 
> > (see the help page).
> > 
> > (If those 80's are expected cell frequencies, they should sum to
> > sum(x) = 851.)
> > 
> > I don't know what Excel does.
> > 
> > Peter
> > 
> > Peter Ehlers
> > University of Calgary
> 
> I'm rather with Peter on this question! I've tried to infer what
> you're really trying to do.
> 
> My a-priori plausible hypothesis was that you have
> 
>   k<-12
> 
> independent observations which have equal expected values
> 
>   m<-rep(80,k)
> 
> and are observed as
> 
>   x<-c(79,52,69,71,82,87,95,74,55,78,49,60)
> 
> On this basis, a chi-squared test Sum((O-E)^2/E) gives
> 
>   C2<-sum(((x-m)^2)/m)
> 
> so C2 = 41.1375, and on this hypothesis the chi-squared would
> have k=12 degrees of freedom. Then:
> 
>   1-pchisq(C2,k)
> ## [1] 4.647553e-05
> 
> which is nowhere near the 1.65778E-14 you report from Excel.
> Also, the result from Peter's chisq.test(x) is p = 0.0006468,
> even further away.
> 
> So this makes me really wonder what you are doing.
> 
> The nearest I can get to your Excel result 1.65778E-14 is
> 
>   ix<-x<m
>   prod(2*ppois(x[ix],m[ix]))*prod(2*(1-ppois(x[!ix],m[!ix])))
> ## 2.831963e-14
> 
> which is based on the guess that independent 2-sided Poisson
> tests of agreement between O and E have been carried out on each
> component, and the final P-value is the product of these P-values.
> 
> But this doesn't make a lot of sense from a statistical point
> of view, so it's time to stop guessing!
> 
> Please tell us what hypothesis you are testing, what sort of
> distribution the x-values are supposed to have, what the
> repeated "80" values represent, and also please tell us
> in detail what you asked Excel to do!
> 
> Then, perhaps, a useful reply can be made.

I think what Excel does is outlined here:

http://www.gifted.uconn.edu/siegle/research/ChiSquare/chiexcel.htm

(Notice the helpful wizard which in step 2 claims that you are doing a
test for independence, not for a given distribution.)

This would seem to coincide with Peter E's guess. The example on that
page matches chisq.test(c(10,3,2))

I believe that the expected values are expected (!) to sum to the
total counts. If they do not, I guess that Excel is numb-skulled
enough to compute sum((O-E)^2/E) anyway and look it up its p value
with k-1 DF. Still gets you nowhere near 1.6e-14 though.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907