[R] generating unordered combinations

Bryan Keller bskeller at wisc.edu
Fri Sep 18 19:24:08 CEST 2009


The combn solution offered by Bill is great.  It struck me that what you are doing, in fact, is generating the null distribution of the two-sample Wilcoxon test where the first group has size m and the second group has size n.  In general, the length of the array has size choose(n+m-1,m) which gets very big very fast.  For example,

> choose(20+20-1,20)
[1] 68923264410

If, on the off chance that you are interested in *summing* the unordered combinations across columns, there is a very slick way to do this that takes a tiny fraction of the time and memory that generating huge arrays entails.  If not, obviously, you already have your solution.

Just in case, here is the code to generate the distribution of sums.  It is based on an algorithm due to Harding (1984).

f <- function(m,n) {

umax <- (m*n+1)

ifelse (umax%%2==0, umaxp <- (umax/2)-1, umaxp <- (umax-1)/2)
#umaxp is analagous to “M” from Harding (1984)

p <- min((m+n),umaxp)
q <- min(m,umaxp)

dis <- c(1,numeric(umaxp))

if ((n+1)>umaxp) {

for (i in 1:q) {			#steps for denominator of generating function
for (j in i:umaxp) {
	dis[j+1] <- (dis[j+1] + dis[(j-i)+1])
	}
	}

} else {

for (i in (n+1):(p)) {		#steps for numerator of generating function
for (j in (umaxp):i) {
	dis[j+1] <- (dis[j+1] - dis[(j-i)+1])
	}
	}

for (i in 1:q) {			#steps for denominator of generating function
for (j in i:umaxp) {
	dis[j+1] <- (dis[j+1] + dis[(j-i)+1])
	}
	}

}

ldis <- length(dis)
ifelse(umax%%2==0,dis <- c(dis,dis[ldis:1]),dis <- c(dis,dis[(ldis-1):1]))
dispr <- dis/choose((n+m),n)
ws <- sum(1:m):sum((n+1):(n+m))
lws <- length(ws)
mat3 <- cbind(ws,dis,dispr,numeric(lws),numeric(lws))
mat3[,4] <- cumsum(mat3[,3])
mat3[,5] <- cumsum(mat3[,3][lws:1])[lws:1]
colnames(mat3) <- c("W","Freq","Probability","Sum up","Sum down")

print(mat3)
}

> system.time(f(20,20))
   user  system elapsed 
   0.11    0.00    0.11 

Bryan




That's brilliant - thanks.

On 17 Sep 2009, at 23:36, William Dunlap wrote:

> There is a 1-1 correspondance between your n-sets
> consisting of m possible element types (0 through m-1
> in your example) and the number of n-subsets of a (n+m-1)-set.
> E.g., your example had m=3 and n=3 and subtracting
> 1:3 from each column of combn(3+3-1,3) gives your result:
>
>> t(combn(3+3-1, 3)-(1:3))
>      [,1] [,2] [,3]
> [1,]    0    0    0
> [2,]    0    0    1
> [3,]    0    0    2
> [4,]    0    1    1
> [5,]    0    1    2
> [6,]    0    2    2
> [7,]    1    1    1
> [8,]    1    1    2
> [9,]    1    2    2
> [10,]    2    2    2
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Dan Halligan
>> Sent: Thursday, September 17, 2009 1:31 PM
>> To: r-help at r-project.org
>> Subject: [R] generating unordered combinations
>>
>> Hi,
>>
>> I am trying to generate all unordered combinations of a set of
>> numbers / characters, and I can only find a (very) clumsy way of  
>> doing
>> this using expand.grid.  For example, all unordered combinations of
>> the numbers 0, 1, 2 are:
>> 0, 0, 0
>> 0, 0, 1
>> 0, 0, 2
>> 0, 1, 1
>> 0, 1, 2
>> 0, 2, 2
>> 1, 1, 1
>> 1, 1, 2
>> 1, 2, 2
>> 2, 2, 2
>>
>> (I have not included, for example, 1, 0, 0, since it is equivalent to
>> 0, 0, 1).
>>
>> I have found a way to generate this data.frame using expand.grid as
>> follows:
>>
>> g <- expand.grid(c(0,1,2), c(0,1,2), c(0,1,2))
>> for(i in 1:nrow(g)) {
>> 	g[i,] <- sort(as.character(g[i,]))
>> }
>> o <- order(g$Var1, g$Var2, g$Var3)
>> unique(g[o,]).
>>
>> This is obviously quite clumsy and hard to generalise to a greater
>> number of characters, so I'm keen to find any other solutions.  Can
>> anyone suggest a better (more general, quicker) method?
>>
>> Cheers
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>


-------------
Bryan Keller, Doctoral Student/Project Assistant
Educational Psychology - Quantitative Methods
The University of Wisconsin - Madison




More information about the R-help mailing list