[R] Wilcox Test / Mann Whitney U Test

Thu Oct 6 16:05:27 CEST 2011

And I figured it out, sorry to bother the list.

The normal approximation I was using is not accurate in the presence of ties.

Sam

On Thu, Oct 6, 2011 at 10:56 AM, Sam Stewart <rhelp.stats at gmail.com> wrote:
> So I checked it with the wilcox_test in the coin library, and got the
> same result.  That makes me more confident that I made a mistake, but
> still doesn't help me find it
>
> d = data.frame(value=c(dropouts,remain),group=c(rep("dropout",length(dropouts)),rep("remain",length(remain))))
> wilcox_test(value~group,data=d)
>
> Sam
>
> On Thu, Oct 6, 2011 at 10:35 AM, Sam Stewart <rhelp.stats at gmail.com> wrote:
>> Hello List,
>>
>> I'm trying to prepare some lecture notes on non parametric methods,
>> and I can't manually reproduce the results of the wilcox.test function
>> for ordinal data.
>>
>> The data I'm using are from David Howell's website, available here
>>
>> http://www.uvm.edu/~dhowell/StatPages/More_Stuff/OrdinalChisq/OrdinalChiSq.html
>>
>> If I run the wilcox.test function on the data I get a p-value of
>> .0407, but when I do it myself I get a p-value of 0.0530.  It's not so
>> much the jump across 0.05, but the fact that I thought I knew what the
>> function was doing.
>>
>> I know from the R help page that there is some controversy about how
>> exactly to calculate the test statistic, but that's not what is
>> causing the problem, as I can get the same W value.  Am I calculating
>> the test statistic incorrectly?
>>
>> Thanks, sample code below
>> Sam Stewart
>>
>> #Ordinal example
>> dropouts = c(rep(0,25),rep(3,10),rep(2,9),rep(1,13),rep(4,6))
>> remain = c(rep(0,31),rep(3,2),rep(2,6),rep(1,21),rep(4,3))
>> tab2 = rbind(table(dropouts),table(remain))
>> ordTest = wilcox.test(x=dropouts,y=remain,correct=FALSE,exact=FALSE)
>> cumsum(colSums(tab2))
>> W = max(c(sum(rank(cbind(dropouts,remain))[1:length(dropouts)]),sum(rank(cbind(dropouts,remain))[-(1:length(dropouts))])))
>> n1 = length(dropouts)
>> n2 = length(remain)
>> testStat = (S-n1*(n1+n2+1)/2)/(sqrt(n1*n2*(n1+n2+1)/12))
>> 2*(1-pnorm(testStat))
>>
>