[R] Weird differing results when using the Wilcoxon-test

Tue Aug 17 20:03:54 CEST 2010

Thanks for the hint.
I tested this generic example and had the same behavior also in a special 
example, that can be found below. This example does not involve continuity 
correction but exibits the same "unexpected" behavior:

GDS_example = function()
{
	print("---> GDS EXAMPLE <---", quote=F)
	#For GDS1331!!!
	exp_all_200601_at = c(281.600, 209.300, 202.600, 263.700, 242.500, 
281.300, 329.100, 268.200, 311.700, 230.600, 280.600, 233.900, 270.200, 
274.000, 169.000, 154.500, 160.100, 196.900, 169.500, 102.800, 148.600, 
111.700, 117.000, 173.000, 79.100, 175.700, 149.500, 151.600, 78.700, 113.900, 
196.200)
	c1 = c(1:14)
	c2 = c(20:31)
	group1 = exp_all_200601_at[c1]
	group2 = exp_all_200601_at[c2]

	wilcox_approx = wilcox.test(group1, group2, correct=F)
	# Deviation estimate -> N.B. the TRUE deviation value can NOT be known as 
we would have to sample _ALL_ possible "tissue" etc.
	#wilcox.test(group1, group2, conf.int=T)$estimate

	g1f = rep(1, length(c1))
	g2f = rep(2, length(c2))
	gf = as.factor(c(g1f, g2f))

	exp_groups_200601_at = exp_all_200601_at[c(c1, c2)]
	exp_groups_data_frame = data.frame( exp_groups_200601_at, gf )
	wilcox_exact = wilcox_test( exp_groups_200601_at ~gf, data = 
exp_groups_data_frame, conf.int=T, distribution="exact" )

	print(wilcox_approx, quote=F)
	print(wilcox_exact)
	print(paste("P-value manually:", 2*(1 - pnorm(statistic(wilcox_exact)), 
quote=F) ))

}

Best,

Cedric

On Tuesday, 17. August 2010 17:50:48 Thomas Lumley wrote:
> After fixing the parentheses in your code so it does run, it seems that the
> difference is that wilcox.test defaults to using a continuity correction
> and your manual calculation does not.   Use wilcox.test(big1, big2,
> correct=FALSE).
> 
>     -thomas
> 
> On Tue, 17 Aug 2010, Cedric Laczny wrote:
> > Hi,
> > 
> > I became a little bit confused when working with the Wilcoxon test in R.
> > As far as I understood, there are mainly two versions:
> > 1) wilcox.test{stats}, which is the default and an approximation,
> > especially, when ties are involved
> > 2) wilcox_test{coin}, which does calculate the distribution _exactly_
> > even, with ties.
> > 
> > I have the following scenario:
> > 
> > #---BeginCode---
> > # big example
> > size = 60
> > big1 = rnorm(size, 0, 1)
> > big2 = rnorm(size, 0.5, 1
> > 
> > g1f = rep(1, size)
> > g2f = rep(2, size)
> > big = c(big1, big2)
> > data_frame = data.frame(big, gr=as.factor(c(g1f, g2f)))
> > 
> > wilcox_approx = wilcox.test(big1, big2)
> > wilcox_exact = wilcox_test(big ~ gr, data=data_frame,
> > distribution="exact") #---EndCode---
> > 
> > I found here
> > http://www-stat.stanford.edu/~susan/courses/s141/hononpara.pdf that
> > wilcox.test (at least for the signed rank test) relies on exact
> > (p-)values until N = n1 + n2 = 50.
> > I can reproduce this, when using e.g. size = 15. The p-values then are
> > the same, as I would expect it, having read the info from the link.
> > 
> > #---BeginCode---
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > #---EndCode---
> > 
> > That said, if I set e.g. size = 60, then the p-values of wilcox.test and
> > wilcox_test differ, as expected.
> > 
> > What's puzzling me particularly is the differing results when wanting to
> > calculate the p-value manually, for bigger sample sizes.
> > 
> > So, if we get the W-score from wilcox.test:
> > 
> > #---BeginCode---
> > W_big = wilcox.test(big1, big2))$statistic
> > #---EndCode---
> > 
> > and "convert" it to a Z-score, like this:
> > 
> > #---BeginCode---
> > mu_big = (size^2)/2
> > sd_big = sqrt(size*size*(size + size + 1)/12)
> > N = size + size
> > sd_big_corr = sqrt( (size * size) / (N * (N - 1)) * (N^3 - N) / 12 )
> > 
> > Z_big = (((W_big - mu_big)/sd_big)
> > #---EndCode---
> > 
> > The Z-Score (Z_big) is equal to the statistic of wilcox_test.
> > So far so good. And now comes the main problem.
> > When I follow the documentation correctly, the p-value for a given
> > W-score/- statistic ist calculated using the normal-approximation with
> > the Z-score. However, when I do that, I get a different result than what
> > I would expect. Because I would expect the p-value of wilcox.test to be
> > equal to 2*pnorm(Z_big), which is in fact _not_ equal. Please see:
> > 
> > #---BeginCode---
> > p_value_manual = 2 * pnorm(Z_big)
> > 
> > print("--- Resulting pvalues --- ", quote=F)
> > print(paste("Wilcox approx p-value:", wilcox_approx$p.value), quote=F)
> > print(paste("Wilcox exact p-value:", pvalue(wilcox_exact)), quote=F)
> > print(paste("P-value manual:", p_value_manual), quote=F)
> > #---EndCode---
> > 
> > So how is the calculation of the p-value performed in wilcox.test, when
> > the sample sizes are big? Because this might explain why the value
> > differs from that being calculated manually.
> > 
> > Best regards,
> > 
> > Cedric
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
> 
> Thomas Lumley
> Professor of Biostatistics
> University of Washington, Seattle