[R] Chi-squared test

Marc Schwartz MSchwartz at mn.rr.com
Fri Nov 25 03:52:24 CET 2005


On Thu, 2005-11-24 at 18:50 -0700, P Ehlers wrote:
> Marc Schwartz wrote:
> > On Thu, 2005-11-24 at 21:55 +0000, Ted Harding wrote:
> > 
> >>On 24-Nov-05 P Ehlers wrote:
> >>
> >>>Bianca Vieru- Dimulescu wrote:
> >>>
> >>>>Hello,
> >>>>I'm trying to calculate a chi-squared test to see if my data are 
> >>>>different from the theoretical distribution or not:
> >>>>
> >>>>chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
> >>
> >>                    c(80,80,80,80,80,80,80,80,80,80,80,80)))
> >>
> >>>>      Pearson's Chi-squared test
> >>>>
> >>>>data:  rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
> >>>>             c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
> >>>>X-squared = 17.6, df = 11, p-value = 0.09142
> >>>>
> >>>>Is this correct? If I'm doing the same thing using Excel I obtained
> >>>>a different value of p.. (1.65778E-14)
> >>>>
> >>>>Thanks a lot,
> >>>>Bianca
> >>>
> >>>It would be unusual to have 12 observed frequencies all equal to 80.
> >>>So I'm guessing that you have a 12-category variable and want to
> >>>test its fit to a discrete uniform distribution. I assume that your
> >>>frequencies are
> >>>
> >>>x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
> >>>
> >>>Then just use
> >>>
> >>>chisq.test(x)
> >>>
> >>>(see the help page).
> >>>
> >>>(If those 80's are expected cell frequencies, they should sum to
> >>>sum(x) = 851.)
> >>>
> >>>I don't know what Excel does.
> >>>
> >>>Peter
> >>>
> >>>Peter Ehlers
> >>>University of Calgary
> >>
> >>I'm rather with Peter on this question! I've tried to infer what
> >>you're really trying to do.
> >>
> >>My a-priori plausible hypothesis was that you have
> >>
> >>  k<-12
> >>
> >>independent observations which have equal expected values
> >>
> >>  m<-rep(80,k)
> >>
> >>and are observed as
> >>
> >>  x<-c(79,52,69,71,82,87,95,74,55,78,49,60)
> >>
> >>On this basis, a chi-squared test Sum((O-E)^2/E) gives
> >>
> >>  C2<-sum(((x-m)^2)/m)
> >>
> >>so C2 = 41.1375, and on this hypothesis the chi-squared would
> >>have k=12 degrees of freedom. Then:
> >>
> >>  1-pchisq(C2,k)
> >>## [1] 4.647553e-05
> >>
> >>which is nowhere near the 1.65778E-14 you report from Excel.
> >>Also, the result from Peter's chisq.test(x) is p = 0.0006468,
> >>even further away.
> > 
> > 
> > It's late on Turkey Day here, but shouldn't that be:
> > 
> > 
> >>1 - pchisq(C2, k - 1)  # 11 df
> > 
> > [1] 2.282202e-05
> > 
> > which is what I get using OO.org's Calc 2.0 with the CHITEST function
> > using the two vectors as the observed (x) and expected (m) values. I
> > also get this result from Gnumeric 1.4.3 using the same CHITEST
> > function.
> > 
> [snip]
> 
> Marc, it's a bit sad to see that OO.org copies Excel's behaviour
> to a _fault_. As Peter D. points out, we would expect the expected
> frequencies and the observed frequencies to sum to the same value.
> Excel (and Calc) blithely ignores that. R, OTH, gives an error
> message when the probabilities don't sum to 1.

Peter, yes indeed. If you search the archives, you see a thread here:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/18179.html

and

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/18474.html

where some discussion on this occurred within the context of rounding
issues and IEEE 754 compliance. Calc has truly copied Excel's behavior
to a fault, since the intention is to be a "drop-in" replacement for the
latter. At least Gnumeric has not done so in all cases, though it has
here.

Calc and Gnumeric indicate that CHITEST is a test for independence, not
for goodness of fit. I did not pay attention to Excel's description, but
presumably it is similar. Clearly no checks on O vs E sums though in any
of these apps.

Further data to reinforce the notion of not using spreadsheets for this.

> Turkey soup for a few days now?

Yes, indeed, along with turkey salad, turkey sandwiches...  :-)

My son is home from McGill in Montreal for the weekend, so he gets to
celebrate Thanksgiving a second time. He can help to reduce the turkey
inventory before flying back on Sunday...  ;-)

Best regards,

Marc




More information about the R-help mailing list