[R] Mann-Whitney by group

Henric (Nilsson) Winell nilsson.henric at gmail.com
Sun Jul 22 18:15:21 CEST 2012

On 2012-07-17 05:13, R. Michael Weylandt wrote:

> On Mon, Jul 16, 2012 at 3:39 PM, Oxenstierna <david.chertudi at gmail.com> wrote:

>> lapply(thing, function(x) x[['p.value']]) --works very well, thank you.
>> Not to be a chore, but I'm interested in comparing the results of
>> wilcox.test--and the methodology we've employed so far--with the results and
>> methodology of wilcox_test (library("coin")).  So, I'd like to compare

There should not be any differences between the p-values obtained using 
'wilcox.test' and 'wilcox_test' in the asymptotic case.  However, the 
latter function allows you to use the exact null distribution even in 
the presence of ties, or use an Monte Carlo approximation of the exact 
null distribution.  Using the approximately exact null distribution is 
particularly helpful when the asymptotics doesn't work well, say, large 
but unbalanced data, and/or the exact computations are too time consuming.

>> groups 5 and 6 across A through H using wilcox_test, and then I'd like to
>> extract the p-values.  Going through the same methodology as above, but
>> replacing wilcox.test with wilcox_test has failed, and so has the p.value
>> extraction method:  lapply(thing, function(x) x[['p.value']]) .
>> I believe the latter failure has to do with the fact that the coin package
>> has a built-in class and built-in extraction method (pvalue() to extract and
>> class "IndependenceTest"), but I don't know how to work around it.  For
>> example, for a single comparison:  wilcox_test(A~Group, Dtb) works fine, and
>> pvalue(wilcox.test(A~Group, Dtb)) extracts the p-value.
>> So, any ideas about how to compare groups 5 and 6 across A through H using
>> wilcox_test?

Well, since you're doing multiple tests here (A, C, ..., H vs Group) you 
should consider adjusting for multiplicity and 'coin' allows you to do 
that easily and efficiently.  A multivariate Wilcoxon rank-sum test can 
be constructed using

 > set.seed(711109) # for reproducibility
 > it <- independence_test(A + C + D + E + F + G + H ~ Group, data = Dtb,
                           ytrafo = function(data)
                               trafo(data, numeric_trafo = rank),
                           distribution = approximate(B = 100000))

approximating the exact null distribution using 100,000 Monte Carlo 

Step-down adjusted p-values taking account of the dependence structure 
between the test statistics and possible discreteness in the null 
distribution are available through

 > (psd <- pvalue(it, "step-down"))
         A       C       D       E       F       G       H
5 0.08598 0.08598 0.08598 0.20018 0.08598 0.08598 0.34182

and using the development version of 'coin', available at R-Forge, we 
can get the unadjusted p-values from

 > (pu <- pvalue(it, "unadjusted"))
         A       C       D       E       F       G       H
5 0.02894 0.02894 0.02894 0.11512 0.02894 0.02894 0.34182

If we look at the ratio of step-down adjusted and unadjusted p-values,

 > psd / pu
          A        C        D        E        F        G H
5 2.970974 2.970974 2.970974 1.738881 2.970974 2.970974 1

we can see that this type of adjustment is pretty powerful compared to 
step-down methods like Bonferroni-Holm that doesn't take account of the 
correlation nor the discreteness

 > p.adjust(pu) / pu
   A C D E F G H
5 7 7 7 2 7 7 1


> I think there are a few things at play here.
> 1) coin uses so-called S4 objects, so `[[` style subsetting isn't
> going to work. The "right way" is, as you have found to use the
> pvalue() function.
> 2) It looks like you need to use the formula intervace for
> wilcox_test. Unfortunately, this makes things a little more
> complicated as you'll need to construct the formula programmatically.
> A one liner looks something like this.
> lapply(LETTERS[1:8], function(x)
> pvalue(wilcox_test(as.formula(paste(x, "~ Group ")), Dtb)))
> Where lapply loops over the letters A,B, etc. and makes the string `A
> ~ Group`, converts it to a formula, passes that to wilcox_test, then
> gets the pvalue and returns it.
> In two lines you could do:
> thing <- lapply(LETTERS[1:8], function(x)
> wilcox_test(as.formula(paste(x, "~ Group")), Dtb))
> thing2 <- lapply(thing, pvalue)
> Where thing has all the test result objects, and thing2 collects the pvalues.
> Hope this helps,
> Michael
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list