[R] p-val issue for ranked two-group test

Fri Oct 21 00:42:13 CEST 2011

On Oct 20, 2011, at 23:48 , Joshua Wiley wrote:

> Hi,
> 
> It looks like you are trying to manually bootstrap.  

Nope. It's a manually performed approximate Wilcoxon test. Which is fair enough if the object is to learn something. (Notice, however, that the ExactRankTests package eats this sort of problem for breakfast.)

As for the actual code, the problem seems mainly to be unclear thinking. The most glaring problem is that "exptdiff" is never actually calculated, but even if it were, it wouldn't be  what the comment says that it is. If it was, it would logically be zero since the "expected mean rank" is the same in both groups (under the null, but what else could be meant?). More likely the intention is to calculate the _observed_ difference in the mean rank, as is done for pdiff (what does "p" stand for??) inside the loop, but based on both.ranks rather than on x. With clarified thinking, I'd expect things to fall into place rather easily.

> Take a look at:
> 
> require(boot)
> ?boot
> 
> as an added advantage of using boot instead of trying to do it
> manually, you can easily parallelize.  In fact, if you are using one
> of the pre-release versions of 2.14.0, the new parallel package is
> included by default and you do not even have to go venturing out into
> the wide world of CRAN to look.  That said, there are several aspects
> of your code that could be readily vectorized.  More specific details
> supplied if a less homework-like example is provided.
> 
> Cheers,
> 
> Josh
> 
> On Thu, Oct 20, 2011 at 10:17 AM, Laurel Klein Serieys
> <laurelklein at ucla.edu> wrote:
>> Hi-
>> I'm wondering if anyone can help me with my code.  I'm coming up dry when I
>> try to get a p-value from the following code.  If I make a histogram of my
>> resampled distribution, I find the difference between by groups to be
>> significant.  I've ranked the data since I have outliers in one of my
>> groups.
>> 
>> mange= c(35,  60,  81, 158, 89, 130,  90,  38, 119, 137,  52,  30,  27, 115,
>> 123,  31, 124,  91)
>> 
>> healthy= c(46, 50, 30, 58, 32, 42, 42, 33, 19, 42, 30, 26, 38, 23, 16, 28,
>> 42, 42, 33, 35, 51, 31, 39, 40 , 42, 38, 36, 39, 38)
>> 
>> l.mange<-length(mange)
>> l.healthy<-length(healthy)
>> 
>> exptdiff <- mean.mange - mean.healthy #the expected difference between
>> between the mean of the ranked groups
>> 
>> 
>> both.chemistry<-c(mange, healthy) #concatenate two vectors into one in
>> preparation for resampling the data
>> 
>> 
>> both.ranks<-rank(both.chemistry) #rank combined data in the case that there
>> are outlying values in the data or the dataset is small
>> 
>> reps=1000
>> 
>> 
>> z<-rep(NA,reps) # z will the the simulated storage value for the resampling
>> efforts
>> 
>> for(i in 1:reps){ #create the loop
>> 
>> x<- sample(both.ranks, length(both.ranks),replace=FALSE) #instructions for
>> how to resample where sample the entire combined data without replacment
>> 
>> p.mange<-mean(x[(1:l.mange)])  #create a simulate mean value for the
>> resampled mange values
>> p.healthy<-mean(x[(l.mange+1):(l.mange+l.healthy)])  #create a simulated
>> mean value for the resampled healthy values
>> 
>> pdiff<- p.mange-p.healthy #the simulated difference between groups
>> 
>> z[i]<- pdiff  #the stored list of simulated differences
>> }
>> p=mean(z>=exptdiff)*2 #2-tailed test multiply by two
>> p
>> 
>> hist(z, xlab="Resample Values", main="Distribution for Two-Group BUN Test")
>> confints=quantile(z, c(0.025,0.975))
>> abline(v=confints, col="blue") #draw a line for each cutoffs
>> abline(v=exptdiff, col="red")
>> 
>> Thanks!
>> L.Serieys
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com