[R] Multiple Comparisons-Kruskal-Wallis-Test: kruskal{agricolae} and kruskalmc{pgirmess} don't yield the same results although they should do (?)

peter dalgaard pdalgd at gmail.com
Fri Aug 3 23:30:29 CEST 2012

```On Aug 3, 2012, at 18:49 , David L Carlson wrote:

> Generally multiple comparisons are conducted after a test for a significant
> difference among any of the groups. For your data
>
>> kruskal.test(x[,1]~x[,2])
>
>        Kruskal-Wallis rank sum test
>
> data:  x[, 1] by x[, 2]
> Kruskal-Wallis chi-squared = 11.0098, df = 10, p-value = 0.3568
>
> There are no significant differences between the groups, so there is no
> reason to use a multiple comparison test.

There's a point to that, but on the other hand, if multiple comparison methods control the familywise error rate by themselves, further "guarding" by a global test really just complicates things even further.

I forgot that we actually had the data... A few points can be made.

First, this is a really unbalanced design:

> table(x[,2])

1   2   3   4   5   6   7   8   9  10  11
267  39  23  24  25  21  17  19  15  16  34

This makes both the grouping method inherently suspect since it assumes at least roughly similar group sizes. However, since it doesn't even attempt to correct for multiple tests, the point is a bit moot.

If we forget the grouping technique, kruskal() gives
> kruskal(x[,1], x[,2], group=F)
....
Comparison between treatments mean of the ranks

Difference   pvalue sig          LCL       UCL
1 - 10  -26.6163390 0.473942      -99.5913928  46.35871
1 - 11  -62.3553095 0.018026   * -113.9832721 -10.72735
...
11 - 3   64.9916880 0.095914   .  -11.5557888 141.53916
11 - 4   84.9056373 0.027780   *    9.3153935 160.49588
11 - 5   75.6064706 0.047290   *    0.9077143 150.30523
11 - 6   61.4978992 0.125302      -17.1938261 140.18962
....

so three comparisons are formally significant at level 0.05, but this is without correction for the multiple comparisons. This roughly amounts to multiplying all p values by 55 (actually 55, 54, 53, ... in the Holm method), which of course doesn't leave anything significant:

Comparison between treatments mean of the ranks

Difference  pvalue sig        LCL       UCL
1 - 10  -26.6163390 1.00000     -150.58158  97.34890
1 - 11  -62.3553095 0.99143     -150.05751  25.34689
...
11 - 3   64.9916880 1.00000      -65.04215 195.02552
11 - 4   84.9056373 1.00000      -43.50211 213.31339
11 - 5   75.6064706 1.00000      -51.28688 202.49982
11 - 6   61.4978992 1.00000      -72.17844 195.17424
...

which is in no way at variance with the global test or the fact that kruskalmc shows no differences to be significant. It also roughly fits results of the stock pairwise.wilcox.test:

> pairwise.wilcox.test(x[,1], x[,2])

Pairwise comparisons using Wilcoxon rank sum test

data:  x[, 1] and x[, 2]

1    2    3    4    5    6    7    8    9    10
2  1.00 -    -    -    -    -    -    -    -    -
3  1.00 1.00 -    -    -    -    -    -    -    -
4  1.00 1.00 1.00 -    -    -    -    -    -    -
5  1.00 1.00 1.00 1.00 -    -    -    -    -    -
6  1.00 1.00 1.00 1.00 1.00 -    -    -    -    -
7  1.00 1.00 1.00 1.00 1.00 1.00 -    -    -    -
8  1.00 1.00 1.00 1.00 1.00 1.00 1.00 -    -    -
9  1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 -    -
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 -
11 0.91 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Pairwise comparisons using Wilcoxon rank sum test

data:  x[, 1] and x[, 2]

1     2     3     4     5     6     7     8     9     10
2  0.705 -     -     -     -     -     -     -     -     -
3  0.916 0.748 -     -     -     -     -     -     -     -
4  0.469 0.557 0.343 -     -     -     -     -     -     -
5  0.675 0.587 0.733 0.920 -     -     -     -     -     -
6  0.985 0.727 0.805 0.608 0.869 -     -     -     -     -
7  0.226 0.349 0.311 0.153 0.187 0.362 -     -     -     -
8  0.282 0.562 0.288 0.135 0.325 0.524 0.874 -     -     -
9  0.173 0.246 0.296 0.105 0.162 0.351 0.820 0.728 -     -
10 0.519 0.650 0.539 0.172 0.407 0.759 0.857 0.855 0.737 -
11 0.016 0.105 0.091 0.041 0.055 0.121 0.727 0.420 0.922 0.499

(They don't get exactly the same p-values because pairwise.wilcox.test is based on ranks recomputed separately for each pair of groups.)

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

```