[R] help comparing two median with R

Thomas Lumley tlumley at u.washington.edu
Tue Apr 17 16:48:07 CEST 2007


On Tue, 17 Apr 2007, Robert McFadden wrote:

>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jim Lemon
>> Sent: Tuesday, April 17, 2007 12:37 PM
>> To: Pedro A Reche
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] help comparing two median with R
>>
>> Pedro A Reche wrote:
>>> Dear R users,
>>> I am new to R and  I would like to ask your help with the following
>>> topic. I have three sets of numeral data, 2 sets are paired and a
>>> third is independent of the other two. For each of these sets I have
>>> obtained their basic statistics (mean, median, stdv, range ...).
>>> Now I want to compare if these sets differ. I could compare
>> the mean
>>> doing a basic T test . However, I was looking for a test to compare
>>> the medians using R.   If that is possible I would love to
>> hear the
>>> specifics.
>>
>> Hi Pedro,
>> You can use the Mann-Whitney test ("wilcox" with two
>> samples), but you would have to check that the second and
>> third moments of the variable distributions were the same, I think.
>>
>> Jim
> Use Mann-Whitney U test, but remember about 2 assumption:
> 1. samples come from continuous distribution (there are no tied
> obserwations)
> 2. distributions are identical in shape. It's very similar to t-test but
> Mann-Whitney U test is not as affected by violation of the homogeneity of
> variance assumption as t-test is.
>

This turns out not to be quite correct.

If the two distributions differ only by a location shift then the 
hypothesis that the shift is zero is equivalent to the medians being the 
same (or the means, or the 3.14159th percentile), and the Mann-Whitney U 
test will test this hypothesis. Otherwise the Mann-Whitney U test does not 
test for equal medians.

The assumption that the distributions are continuous is for convenience -- 
it makes the distribution of the test statistic easier to calculate and 
otherwise R uses a approximation.  The assumption of a location shift is 
critical -- otherwise it is easy to construct three data sets x,y,z so 
that the Mann-Whitney U test thinks x is larger than y, y is larger than z 
and z is larger than x (Google for Efron Dice). That is, the Mann-Whitney 
U test cannot be a test for any location statistic.

There actually is an exact test for the median that does not assume a 
location shift:  dichotomize your data at the pooled median to get a 2x2 
table of above/below median by group, and do Fisher's exact test on the 
table.  This is almost never useful (because it doesn't come with an 
interval estimate), but is interesting because it (and the generalizations 
to other quantiles) is the only exactly distribution-free location test 
that does not have the 'non-transitivity' problem of the Mann-Whitney U 
test.  I believe this median test is attributed to Mood, but I have not 
seen the primary source.

 	-thomas



More information about the R-help mailing list