[R] normal scores test

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sat Mar 6 23:08:53 CET 2004


"Padmanabhan, Sudharsha" <sudar_80 at neo.tamu.edu> writes:

> Hello,
> 
> I need help in performing a Van_der_Waerden normal scores test in R. I 
> have two arrays of scores(final on therapy scores from drug and placebo) and 
> want to use the normal scores procdeure to test for significance. 
> (observations are unequal in number - due to dropouts). Could you please help 
> me out with the coding or let me know if there is a package that can be used 
> (for example, suppdist etc.)
> 
> Thanks in advance
> 
> Regards
> 
> ~S

I don't think we have the test pre-cooked (someone correct me if I'm
wrong) although I vaguely recall some e-conversations with Kurt Hornik
ages ago about doing a generic score test function for his ctest
package.

There are two things to it: calculating the actual scores and finding
the distribution of the test statistic. 

As far as I remember, the definition of the normal scores is that they
are the expected value of the jth order statistic in the normal
distribution. This is pretty close to qnorm(ppoints(n)), but not
exactly. However, the difference is likely to be immaterial. So your
scores would be qnorm(ppoints(n))[order(x)] or, in order to better
cope with ties, you might do 

n <- length(x)
s <- qnorm(rank(x)/(n+1)).

(make sure to remove NAs since rank() will otherwise have surprises
for you...)

The test statistic is 

T <- sum(s[group==1])

Once you have the scores, you need to work out the correlation
structure of s so that you can calculate the mean and variance of the
sum of the k observations in group 1. This is the same basic idea for
all scoring schemes. Keep the set of scores fixed and calculate the
permutation distribution.

The mean is obvious:

m <- mean(s)

and actually, so is the variance, but you need to know that you need
the "finite population variance" which divides by n rather than n-1.

V <- var(s) * (n - 1)/n

Now what about the covariance between two elements of s? This is
actually dead easy once you realize that the sum(s) is a constant and
that all covariances are identical for symmetry reasons: 

n * V + (n^2 - n) * C = 0  ==>  C = - 1/(n-1) * V

so, we get that T has mean k*m and variance k * (1 - k/(n-1)) * V

So, look up T in the normal distribution with those parameters and
you're basically done.

Note also that you can rather easily simulate from the permutation
distribution of T, e.g.:

replicate(10000, sum(s[sample(n,k)]))



-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list