[R] Jonckheere-Terpstra test

Mon Oct 6 14:10:25 CEST 2003

On 06-Oct-03 Kim Mouridsen wrote:
> The Jonckheere-Terpstra test is a distribution-free test for ordered
> alternatives in a one-way layout. More specifically, assume
> 
> X_ij = m + t_j + e_ij,                i=1,...,n_j and j=1,...,k,
> 
> where the errors are idependent and identically distributed. Then you
> can use the Jonckheere-Terpstra to test
> 
> H_0:    t_1=t_2=...=t_k
> against
> H_A:    t_1<=t_2<=...<=t_k,
> 
> where at least one of the inequalities is strict.

To be more precise: the Jonkheere test applies where H_A is stochastically
increasing in j (P[X_j <= x] >= P[X_j+1 <= x], for all x, with inequality
for at least one j). When k=2, the Jonkheere statistic is the same as the
Mann-Whitney U. For k > 2, it is the sum for j1 = 1:(k-1) of the sum
for j2 = (j1+1):k of the Mann-Whitney Us for samples j1 and j2. To apply
it cleanly, there is an assumption of no ties (as if variables were
continuous).

> To my knowledge there is no R code for this test but the test statistic
> is not too hard to calculate (you have to calculate some Mann-Whitney
> counts) and the p-value can be found in a table - or in case you have
> many observations you can use a large-sample approximation. 

I once published an algorithm for the exact distribution in JRSS C
(Applied Statistics), 1984, pp. 1-6.

This was devised in the days of programmable calculators and 64K 4MHz
micros when RAM and computation time were major issues. However, the code
is easily implemented and may be as straightforward as any, even now.

Let W denote the Jonkheere statistic. To compute the distribution of W
over the range 0:M use the following algorithm (which applies as it
stands to the M-W U statistic with sample sizes m, n).
NOTE that indexing starts at 0.

U[M,m,n]:
[A]  f(0) = 1 ; f(1) = ... = f(M) = 0
[B]  If (n+1 > M) go to [C]
     P = min(m+n,M)
     for( t = (n+1):P )
       for( u = M:t ) ## Note reverse order: t = M,M-1,...,t
         f(u) = f(u) - f(u-t)

[C]  Q = min(m,M)
     for( s = 1:Q )
       for( u = s:M )
         f(u) = f(u) + f(u-s)

For Jongkheere with sample sizes n-1, n_2, ... , n_k, let

     N_i = n_(i+1) + ... + n_k ,  i = 1, 2, ... , k-1

Run the above first from [A] with m = n_1, n = N_1 and then repeatedly
from [B] with m = n_2, n = N_2; ... ; m = n_(k-1), n = N_(k-1) = n_k.

At the end, f(0), f(1), ... , f(M) will contain the frequencies (counts)
of the numbers of ways in which a value W = 0, 1, ... , M can arise
(purely combinatorial). On H_0, all redistributions of sample values
are equally likely, so to get the probability distribution divide by
the number of all possible reallocations (the combinatorial number of
choices of (n_1, n_2, ... , n_k) out of N = sum(n_i)).

Or you can compute the complete frequency distribution and divide each
term by the sum of all.

The above 'algorithmicises' the algebra of the generating function for
the counts.

Have fun!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 06-Oct-03                                       Time: 13:10:25
------------------------------ XFMail ------------------------------