[R] Chi-squared test

Fri Nov 25 11:33:37 CET 2005

P Ehlers wrote:

>
> Marc Schwartz wrote:
>
>> On Thu, 2005-11-24 at 21:55 +0000, Ted Harding wrote:
>>
>>> On 24-Nov-05 P Ehlers wrote:
>>>
>>>> Bianca Vieru- Dimulescu wrote:
>>>>
>>>>> Hello,
>>>>> I'm trying to calculate a chi-squared test to see if my data are 
>>>>> different from the theoretical distribution or not:
>>>>>
>>>>> chisq.test(rbind(c(79,52,69,71,82,87,95,74,55,78,49,60),
>>>>
>>>
>>>                    c(80,80,80,80,80,80,80,80,80,80,80,80)))
>>>
>>>>>      Pearson's Chi-squared test
>>>>>
>>>>> data:  rbind(c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60),
>>>>>             c(80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80))
>>>>> X-squared = 17.6, df = 11, p-value = 0.09142
>>>>>
>>>>> Is this correct? If I'm doing the same thing using Excel I obtained
>>>>> a different value of p.. (1.65778E-14)
>>>>>
>>>>> Thanks a lot,
>>>>> Bianca
>>>>
>>>>
>>>> It would be unusual to have 12 observed frequencies all equal to 80.
>>>> So I'm guessing that you have a 12-category variable and want to
>>>> test its fit to a discrete uniform distribution. I assume that your
>>>> frequencies are
>>>>
>>>> x <- c(79, 52, 69, 71, 82, 87, 95, 74, 55, 78, 49, 60)
>>>>
>>>> Then just use
>>>>
>>>> chisq.test(x)
>>>>
>>>> (see the help page).
>>>>
>>>> (If those 80's are expected cell frequencies, they should sum to
>>>> sum(x) = 851.)
>>>>
>>>> I don't know what Excel does.
>>>>
>>>> Peter
>>>>
>>>> Peter Ehlers
>>>> University of Calgary
>>>
>>>
>>> I'm rather with Peter on this question! I've tried to infer what
>>> you're really trying to do.
>>>
>>> My a-priori plausible hypothesis was that you have
>>>
>>>  k<-12
>>>
>>> independent observations which have equal expected values
>>>
>>>  m<-rep(80,k)
>>>
>>> and are observed as
>>>
>>>  x<-c(79,52,69,71,82,87,95,74,55,78,49,60)
>>>
>>> On this basis, a chi-squared test Sum((O-E)^2/E) gives
>>>
>>>  C2<-sum(((x-m)^2)/m)
>>>
>>> so C2 = 41.1375, and on this hypothesis the chi-squared would
>>> have k=12 degrees of freedom. Then:
>>>
>>>  1-pchisq(C2,k)
>>> ## [1] 4.647553e-05
>>>
>>> which is nowhere near the 1.65778E-14 you report from Excel.
>>> Also, the result from Peter's chisq.test(x) is p = 0.0006468,
>>> even further away.
>>
>>
>>
>> It's late on Turkey Day here, but shouldn't that be:
>>
>>
>>> 1 - pchisq(C2, k - 1)  # 11 df
>>
>>
>> [1] 2.282202e-05
>>
>> which is what I get using OO.org's Calc 2.0 with the CHITEST function
>> using the two vectors as the observed (x) and expected (m) values. I
>> also get this result from Gnumeric 1.4.3 using the same CHITEST
>> function.
>>
> [snip]
>
> Marc, it's a bit sad to see that OO.org copies Excel's behaviour
> to a _fault_. As Peter D. points out, we would expect the expected
> frequencies and the observed frequencies to sum to the same value.
> Excel (and Calc) blithely ignores that. R, OTH, gives an error
> message when the probabilities don't sum to 1.
>
> Turkey soup for a few days now? 

Thanks a lot for your answers! I have a fault in my Excel sheet:(,
sorry. I corrected it and indeed I obtained 2.282202e-05

As  I want to make a comparaison between independent observations which
have equal expected values, I will do as Marc suggested and give up at
the idea of using excel:)

Bianca