[R] problem with wilcox.test()

peter dalgaard pdalgd at gmail.com
Thu Jul 16 15:47:00 CEST 2015


> On 16 Jul 2015, at 15:13 , Ivan Calandra <ivan.calandra at univ-reims.fr> wrote:
> 
> Dear useRs,
> 
> I am running a wilcox.test() on two subsets of a dataset and get exactly the same results although the raw data are different in the subsets.
> 
> mydata <- structure(list(cat1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c("high", "low"), class = "factor"), cat2 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("large", "small"), class = "factor"), var1 = c(2.012743, 1.51272, 1.328453, 1.2609935, 1.617757, 1.8175455, 1.890035, 2.3652205, 1.295888, 1.5985145, 1.081813, 1.856733, 2.366358, 2.27421, 1.727023, 2.230433, 5.272843, 3.7626355), var2 = c(0.00196, 0.0066545, 0.006188, 0.0058985, 0.004453, 0.005468, 0.003773, 0.004742, 0.007525, 0.0081235, 0.004611, 0.0050475, 0.006643, 0.0097335, 0.009213, 0.0049525, 0.006243, 0.006021)), .Names = c("cat1", "cat2", "var1", "var2"), row.names = c(NA, 18L), class = "data.frame")
> 
> #p-values are identical but W different for the first variable
> wilcox.test(var1~cat1, data=mydata[mydata$cat2=="large",])
> wilcox.test(var1~cat1, data=mydata[mydata$cat2=="small",])
> 
> #both p-values and W are identical for the second variable
> wilcox.test(var2~cat1, data=mydata[mydata$cat2=="large",])
> wilcox.test(var2~cat1, data=mydata[mydata$cat2=="small",])
> 
> Did I do something wrong or does it just have something to do with my dataset? Or is it just a coincidence?

Coincidence, mostly, I think:

You have

> table(mydata[mydata$cat2=="small","cat1"])

high  low 
   4    5 

> table(mydata[mydata$cat2=="large","cat1"])

high  low 
   4    5 

and all of your response variables' values are distinct.

In both cases, the null distribution of the rank sum W is that of (sum(sample(1:9,4))-sum(1:4)) which is a distribution on 0:20, symmetric around 10. Hence there are only 11 different p-values possible, so it is not particularly odd that you may get the same one twice.


> 
> Thank you in advance for your help,
> Ivan
> 
> -- 
> Ivan Calandra, ATER
> University of Reims Champagne-Ardenne
> GEGENAA - EA 3795
> CREA - 2 esplanade Roland Garros
> 51100 Reims, France
> +33(0)3 26 77 36 89
> ivan.calandra at univ-reims.fr
> https://www.researchgate.net/profile/Ivan_Calandra
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list