[R] (Fisher) Randomization Test for Matched Pairs: Permutation Data Setup Based on Signs

peter dalgaard pdalgd at gmail.com
Sun Mar 11 09:41:44 CET 2012


On Mar 11, 2012, at 03:17 , R. Michael Weylandt wrote:

> In general, I *think* this is a hard problem (it sounds knapsack-ish)
> but since you are on small enough data sets, that's probably not so
> important: if I understand you right, this little function will help
> you.
> 
> plusminus <- function(n){
>    t(as.matrix(do.call(expand.grid, rep(list(c(-1,1)), n))))
> }
> plusminus(3)
> plusminus(5)
> 
> If you multiply the output of this function by your data set you will
> have rows corresponding to all possible sign choices: e.g.,
> 
> plusminus(3) * c(1,2,3)
> 
> Then you can colSums() using only the positive elements:
> 
> x <- plusminus(3) * c(1,2,3)
> x[x < 0] <- 0
> 
> colSums(x)
> 
> To wrap this all in one function: I'd do something like this:
> 
> test.statistic <- function(v){
>    m <- t(as.matrix(do.call(expand.grid, rep(list(c(-1, 1)), length(v)))))
>    x <- m * v
>    x[x<0] <- 0
>    out <- rbind(m * v, colSums(x))
>    rownames(out)[length(rownames(out))] <- "Sum of Positive Elements"
>    out
> }
> 
> X <- test.statistic(c(-16, -4, -7, -3, -5, +1, -10))
> X[,1:10]
> 
> Hopefully that helps (I'm a little fuzzy on your overall goal -- so
> that second bit might be a red herring)

Looks pretty much OK to me. Just one note: In this sort of problem, you can do away with the business of the sum of the positive elements and just do the sum. This is because 

sum(x[x>0])-sum(x[x<0]) == sum(abs(x))
sum(x[x>0])+sum(x[x<0]) == sum(x)

and the sum(abs(x)) is of course the same, no matter how you assign signs to x. Add the two equations and divide by two and you get

sum(x[x>0]) == (sum(x) + sum(abs(x))/2

This in turn means that you can just do 

allsums <- as.matrix(do.call(expand.grid, rep(list(c(-1,1)), n))) %*% scores

and then mean(allsums >= sum(scores)) to get the proportion of more extreme test statistics. You can even leave the signs on the scores in the computation of allsums because that will just affect the order of the sign-permuted samples.


> 
> Michael
> 
> 
> On Fri, Mar 9, 2012 at 12:49 AM, Ghandalf <moolag- at hotmail.com> wrote:
>> Hi,
>> 
>> I am currently attempting to write a small program for a randomization test
>> (based on rank/combination) for matched pairs. If you will please allow me
>> to introduce you to some background information regarding the test prior to
>> my question at hand, or you may skip down to the bold portion for my issue.
>> 
>> There are two sample sizes; the data, as I am sure you guessed, is matched
>> into pairs and each pair's difference is denoted by Di.
>> 
>> The test statistic =*T* = Sum(Di) (only for those Di > 0).
>> 
>> The issue I am having is based on the method required to use in R to setup
>> the data into the proper structure. I am to consider the absolute value of
>> Di, without regard to their sign. There are 2^n ways of assigning + or -
>> signs to the set of absolute differences obtained, where n = the number of
>> Dis. That is, we can assign + signs to all n of the |Di|, or we might assign
>> + to |D1| but - signs to |D2| to |Dn|, and so forth.
>> 
>>  So, for example, if I have *D1=-16, D2=-4, D3=-7, D4=-3, D5=-5, D6=+1, and
>> D7=-10 and n=7. *
>> I need to consider the 2^7 ways of assigning signs that result in the lowest
>> sum of the "positive" absolute difference. To exemplify further, we have
>> *
>> -16, -4, -7, -3, -5, -1, -10            T = 0
>> -16, -4, -7, -3, -5, +1, -10           T = 1
>> -16, -4, -7, +3, -5, -1, -10           T = 3
>> -16, -4, -7, +3, -5, +1, -10          T = 4 *
>> ... and so on.
>> 
>> So, if you are willing to help me, I am having trouble on setting up my data
>> as illustrated above./ How do I create (a code for) the 2^n lines of data
>> required with all the possible combinations of + and - in order to calculate
>> the positive values in each line (the test statistic, T)?/ I have tried to
>> use combn(d=data set, n=7) with a data set, d, consisting of both the
>> positive and negative sign of the respective value, to no avail.
>> 
>> I apologize if this is lengthy, I was not sure how to ask the aforementioned
>> question without incorrectly portraying my thoughts. If any clarification is
>> required then I will by more than willing to oblige with any further
>> explanation. I have searched for possible solutions, but alas, came out
>> empty handed.
>> 
>> Thank you.
>> 
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Fisher-Randomization-Test-for-Matched-Pairs-Permutation-Data-Setup-Based-on-Signs-tp4458606p4458606.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list