[Rd] summary.table bug in parameter (and fix) (PR#2526)

rreeve@liposcience.com rreeve@liposcience.com
Mon Feb 3 18:43:02 2003


I sent this in with an old version, but it's in latest version as well. The fix is simple.

In the summary.table function, the parameter is calculated incorrectly
for a test of independence among all cells when the table is more than
2-way table.

Example:

Consider X:
> X
     a   b  c
1   A1  B2 C1
2   A3 BA3 C2
3   A2  B1 C4
4   A1  B2 C3
5   A3 BA3 C2
6   A1 BA3 C1
7   A2 BA3 C2
8   A1 BA3 C3
9   A1  B2 C1
10  A1 BA3 C2
11  A2  B1 C2
12  A1  B2 C3
13  A3 BA3 C4
14  A2  B1 C3
15  A2  B1 C2
16  A2 BA3 C4
17  A3 BA3 C3
18  A3 BA3 C4
19  A2 BA3 C4
20  A1  B2 C3
21  A1 BA3 C1
22  A2  B1 C2
23  A2 BA3 C3
24  A2  B1 C2
25  A2  B1 C2
26  A3  B1 C1
27  A2 BA3 C1
28  A2 BA3 C2
29  A2 BA3 C4
30  A3  B1 C2
31  A1  B1 C4
32  A2  B2 C1
33  A1  B2 C3
34  A2 BA3 C3
35  A2 BA3 C1
36  A2 BA3 C1
37  A2  B2 C2
38  A2  B2 C2
39  A2  B2 C4
40  A2  B1 C4
41  A2  B2 C4
42  A1  B1 C3
43  A2 BA3 C4
44  A2 BA3 C2
45  A2  B1 C2
46  A2  B2 C1
47  A3  B1 C3
48  A1  B2 C1
49  A2  B1 C2
50  A1  B1 C1
51  A3  B2 C4
52  A1  B2 C1
53  A3  B2 C1
54  A2 BA3 C1
55  A2  B1 C2
56  A1 BA3 C3
57  A3  B1 C2
58  A1  B1 C3
59  A2  B2 C2
60  A2 BA3 C1
61  A2  B1 C3
62  A3  B2 C2
63  A2 BA3 C2
64  A2  B1 C3
65  A1 BA3 C3
66  A2  B2 C2
67  A2  B1 C4
68  A2 BA3 C2
69  A2 BA3 C2
70  A2  B1 C3
71  A3 BA3 C2
72  A3 BA3 C3
73  A2  B2 C1
74  A2  B1 C3
75  A1 BA3 C1
76  A2 BA3 C3
77  A3  B1 C4
78  A3 BA3 C3
79  A2  B1 C3
80  A2  B1 C3
81  A2 BA3 C4
82  A2  B1 C2
83  A3 BA3 C1
84  A2 BA3 C1
85  A2 BA3 C4
86  A3 BA3 C1
87  A2 BA3 C1
88  A2  B2 C1
89  A3 BA3 C2
90  A2 BA3 C1
91  A1  B1 C4
92  A2  B1 C1
93  A2  B1 C1
94  A2  B1 C3
95  A2 BA3 C1
96  A3  B2 C1
97  A3 BA3 C4
98  A2  B1 C2
99  A2 BA3 C3
100 A2  B1 C2

> X.table <- table(X)
> X.table
, , c = C1

    b
a    B1 B2 B3
  A1  3  2  3
  A2  6  4  5
  A3  3  2  0

, , c = C2

    b
a    B1 B2 B3
  A1  0  0  1
  A2  7  7  7
  A3  3  1  3

, , c = C3

    b
a    B1 B2 B3
  A1  3  3  3
  A2  6  3  3
  A3  1  0  3

, , c = C4

    b
a    B1 B2 B3
  A1  0  0  2
  A2  2  2  7
  A3  4  1  0

Now, we construct a table from X (3x3x4):
> X.table <- table(X)
> summary(X.table)
Number of cases in table: 100 
Number of factors: 3 
Test for independence of all factors:
        Chisq = 30.232, df = 12, p-value = 0.002576
        Chi-squared approximation may be incorrect

The df is incorrect. It is calculated as 2*2*3 = 12, but should be 3*3*4 - 1  - (3-1) - (3-1) - (4-1) = 28 (df of interaction terms not in main effects only model). Now consider equivalent model tested in loglin:

> loglin(X.table, list(1,2,3))
2 iterations: deviation 3.552714e-15 
$lrt
[1] 38.32701

$pearson
[1] 30.23244

$df
[1] 28

$margin
$margin[[1]]
[1] "a"

$margin[[2]]
[1] "b"

$margin[[3]]
[1] "c"


The statistic is identical, but df different. The problem is in summary.table:

Current version of summary.table (INCORRECT VERSION):

function (object, ...) 
{
    if (!inherits(object, "table")) 
        stop("object must inherit from class table")
    n.cases <- sum(object)
    n.vars <- length(dim(object))
    y <- list(n.vars = n.vars, n.cases = n.cases)
    if (n.vars > 1) {
        m <- vector("list", length = n.vars)
        for (k in seq(along = m)) {
            m[[k]] <- apply(object, k, sum)/n.cases
        }
        expected <- apply(do.call("expand.grid", m), 1, prod) * 
            n.cases
        statistic <- sum((c(object) - expected)^2/expected)
        parameter <- prod(sapply(m, length) - 1)			#### Problem is on this line (works only for 2-way tables)
        y <- c(y, list(statistic = statistic, parameter = parameter, 
            approx.ok = all(expected >= 5), p.value = pchisq(statistic, 
                parameter, lower.tail = FALSE), call = attr(object, 
                "call")))
    }
    class(y) <- "summary.table"
    y
}

summary.table should be (CORRECTED VERSION)

function (object, ...) 
{
    if (!inherits(object, "table")) 
        stop("object must inherit from class table")
    n.cases <- sum(object)
    n.vars <- length(dim(object))
    y <- list(n.vars = n.vars, n.cases = n.cases)
    if (n.vars > 1) {
        m <- vector("list", length = n.vars)
        for (k in seq(along = m)) {
            m[[k]] <- apply(object, k, sum)/n.cases
        }
        expected <- apply(do.call("expand.grid", m), 1, prod) * 
            n.cases
        statistic <- sum((c(object) - expected)^2/expected)
        parameter <- prod(sapply(m, length)) - (sum(sapply(m, 
            length) - 1) + 1)					### This line changed (works for all multiway tables)
        y <- c(y, list(statistic = statistic, parameter = parameter, 
            approx.ok = all(expected >= 5), p.value = pchisq(statistic, 
                parameter, lower.tail = FALSE), call = attr(object, 
                "call")))
    }
    class(y) <- "summary.table"
    y
}


Using the corrected summary.table (using the correct code above):

> summary(X.table)
Number of cases in table: 100 
Number of factors: 3 
Test for independence of all factors:
        Chisq = 30.232, df = 28, p-value = 0.3522
        Chi-squared approximation may be incorrect

These are correct.

--Russell Reeve
rreeve@liposcience.com

--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 1
 minor = 6.2
 year = 2003
 month = 01
 day = 10
 language = R

Windows 2000 Professional (build 2195) Service Pack 2.0

Search Path:
 .GlobalEnv, package:ctest, Autoloads, package:base