[R] Odd results with Chi-square test. (Not an R problem, but general statistics, I think)

Tue Aug 18 19:17:20 CEST 2009

I'm far from an expert on stats but what I think you are saying is if you try and compare Baseline with Version 3 you don't think your p-value is as good as version 1 and 2.  I'm not 100% sure you are meant to do that with p-values but I'll let someone else comment on that!.

                total    incorrect  correct   % correct
baseline     898      708         190       21.2%
version_1   898      688         210       23.4%
version_2   898      680         218      24.3%
version_3   1021    790          231      22.6%

>
> Here, the p value for version_3 (when compared with the baseline) seems to
> make no sense whatsoever. It shouldn't be larger that the other two p
> values, the increase in correct answers (that is what counts!) is bigger
> after all.
>
No its not the raw numbers its the proportion of correct answers that counts.

I've added a % correct to your data - does that  make it clearer?  Only 22.6% of version 3's answers were correct - so the difference in terms of % change is smaller than version 1 and 2 produced.  From my niave persepctive I'd want to test for a difference between all results and baseline, and then v1 & v2, v1 & v3, v2 & v3  (you may tell me they are unsound things to test - in which case don't test them.  You'd then need to determine a threshold for accepting that the test is valid (say p < 0.05).  I'#d contest that the test should be two tailed - results could be better or worse?

You should also develop a hypothesis.  Let me create one for you:

A.
H1: version1 of the software is better than baseline
(H0: version 1 is no better than baseline)

B.
H1: version2 of the software is better than version 1
(H0: version 2 is no better than version 1)

C.
H1: version3 of the software is better than version 2
(H0: version 3 is no better than version 2)

Now look at you results and p-values and and work out if the H1 or H0 applies. You could develop further variants (D: version 3 is better than baseline).

Finally - remember to consider the 'clinical significance' as well as the statistical significance.  I'd have hoped a software change might have increase correct answers to say 40%?  And remember also that p-value of 0.05 has a false positive rate of 1:20.

>
> Any idea what's going on here? I thought the sample size should have no
> impact on the results?
>
Erm.. sample size always has an influence of results,  If you show a  difference in 100 samples - you would expect a larger p value for virtually any statistical test you chose than if you show that same difference in 1000 results.  You have a bigger sample but a smaller overall difference so in effect you can be less sure that that change is not down to chance. (Purists statisticians will likely challenge that definition)

********************************************************************************************************************

This message may contain confidential information. If yo...{{dropped:21}}