[R] chisq.test, basic question

Huntsinger, Reid reid_huntsinger at merck.com
Tue Jul 30 23:15:11 CEST 2002

```My previous reply (below) uses "false positive" in a particularly misleading
way. I intended this to mean "incorrect rejection of the null hypothesis of
no association". I succumbed to the temptation to call a "rejection of the
null hypothesis of no association" a "positive" (cancelling a double
negative?), but as it is a rejection (of no matter what) I should have
called it a "negative".

Reid Huntsinger

-----Original Message-----
From: Huntsinger, Reid [mailto:reid_huntsinger at merck.com]
Sent: Tuesday, July 30, 2002 12:07 PM
To: 'juli g. pausas'; r-help
Subject: RE: [R] chisq.test, basic question

The cells are interpreted as counts, so by scaling you're analyzing a
different experiment (one with fewer observations). So the chi-squared value
will change (the terms (O-E)^2/E in the statistic scale linearly ignoring
rounding and "Yates' continuity correction").

The chisq.test on the original data is a test of association. Conventionally
you decide ahead of time on a threshold for "false positives", say 5%, then
use the reported p-value to determine whether to accept or reject the null
hypothesis of no association. Had you chosen 5%, since the reported p-value
is smaller than 5%, you would reject, i.e., decide that association is
present.

Chisq.test is not really a measure of association. Your observation is a
nice illustration of why. There are many measures of association (e.g., odds
ratio); see for example Alan Agresti's "Categorical Data Analysis" for some
discussion.

Reid Huntsinger

-----Original Message-----
From: juli g. pausas [mailto:juli at ceam.es]
Sent: Tuesday, July 30, 2002 12:12 PM
To: r-help
Subject: [R] chisq.test, basic question

Dear R-users,
I have a question, which I'm not sure if it is related to my
misunderstanding of basic statistics, or my misunderstanding of R, or
both.
I've got the counts of a 2 x 2 contingency table, and I'd like to test
the association:

m <-  matrix(c(15,28,32,135), 2, 2)
colnames(m) <- c("R-", "R+"); rownames(m) <- c("P-", "P+")
m
#    R-  R+
# P- 15  32
# P+ 28 135

chisq.test(m)  # X-squared = 4.0027, df = 1, p-value = 0.04543

Is this the correct way to test association between P and R? (I haven't
got the original data).
My problem is that if I use percentage, then I get different results:

m2 <- 100*m/sum(m) #
chisq.test(round(m2)) # X-squared = 1.5318, df = 1, p-value = 0.2158

Should this give about the same (a part from the rounding)? Should the
degree of association between P and R be he same?  Or, am I using
chisq.test() wrongly?

Juli

```