[R] Wilcoxon signed rank test and its requirements

Fri Jun 25 21:00:50 CEST 2010

On Thu, 24 Jun 2010, Atte Tenkanen wrote:

>> On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:
>>
>>> Thanks. What I have had to ask is that
>>>
>>> how do you test that the data is symmetric enough?
>>> If it is not, is it ok to use some data transformation?
>>>
>>> when it is said:
>>>
>>> "The Wilcoxon signed rank test does not assume that the data are
>>> sampled from a Gaussian distribution. However it does assume that
>>> the data are distributed symmetrically around the median. If the
>>> distribution is asymmetrical, the P value will not tell you much
>>> about whether the median is different than the hypothetical value."
>>
>> You are being misled. Simply finding a statement on a statistics
>> software website, even one as reputable as Graphpad (???), does not
>> mean that it is necessarily true. My understanding (confirmed
>> reviewing "Nonparametric statistical methods for complete and censored
>>
>> data" by M. M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-
>>
>> rank test does not require that the underlying distributions be
>> symmetric. The above quotation is highly inaccurate.
>>
>> --
>> David.
>
> Thanks. Unfortunately, I can't follow the reference at all, but I read this in that way that I can be carefree as far as the underlying distribution is concerned?
>
> Is there any other authoritative reference where that is just stated in a way "test does not require that the underlying distributions be   symmetric or normal".
>

The statement from GraphPad is correct, but for a different question.  Let me expound.

First let us consider means:

If you have paired samples X1.. Xn and Y1..Yn you could ask if the mean of X is equal to the mean of Y, or if the mean of (X-Y) is zero.   These are equivalent questions, because of the way the mean is defined.   So the paired t-test, which answers the first question, and the one-sample t-test, which answers the second question, are equivalent.  They have no assumptions (other than sufficient sample size for the means to be Normally distributed).

Now, let us consider medians.
f you have paired samples X1.. Xn and Y1..Yn you could ask if the median of X is equal to the median of Y, or if the median of (X-Y) is zero.  The first question can be answered by any standard test (though there are ways to do it).  The second is answered by the sign test.  They are not at all equivalent: it is possible for the median of X to be larger than the median of Y but the median of (X-Y) to be negative.   The non-equivalence is true for essentially all statistics except for the mean.

Now, let us consider the Wilcoxon signed-rank test.
This can be characterized precisely as a test of the null hypothesis that the median pairwise mean of  X-Y is zero. That is, take all n(n-1)/2 pairs of (X-Y)s.  Take the mean of each pair to get n(n-1)/2 pairwise means. Take the median of these numbers.  The p-value will be 0.5 one-sided or 1.0 two-sided when this median pairwise mean is exactly zero.  The median pairwise mean is also sometimes known as the Hodges-Lehmann estimator (though this is strictly speaking a more general term).

As David correctly points out, no assumptions are needed for the Wilcoxon signed-rank test to be a test of *this* null hypothesis.   The problem is that this may not be the null hypothesis you care about.  As GraphPad correctly points out, "the P value will not tell you much about whether the *median* is different than the hypothetical value" because the median is not the same as the median pairwise mean.  It is entirely possible for the median difference to be positive and the median pairwise mean difference to be zero or negative.

If you assume that the distribution of differences X-Y is symmetric, then the Wilcoxon signed-rank test also tests the null hypothesis that the median of X-Y is zero (and that the mean of X-Y is zero), because these null hypotheses are equivalent for a symmetric distribution.  That's what GraphPad is saying

You could also assume that the distributions X and Y are stochastically ordered.  This basically implies that the direction of difference is the same no matter what location statistic you use to measure it. If X was before some intervention and Y was afterwards you would basically be assuming that the intervention is either beneficial for everyone or harmful for everyone (up to measurement error). Under this assumption, the signed rank test also tells you reliably about differences in medians.

To some extent this is a philosophical issue.  My preference is to know exactly what a test is doing and to make these distinctions.   Many other people, including reputable experts like Frank Harrell, believe (I think) that simplifying assumptions such as stochastic ordering are a pretty good approximation in a lot of situations, so it isn't necessary to always make these distinctions.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle