[R] Generating groupings of ordered observations

Charles C. Berry cberry at tajo.ucsd.edu
Sat Jun 21 23:03:57 CEST 2008


On Sat, 21 Jun 2008, Gavin Simpson wrote:

> Dear Chuck,
>
> That is, quite simply, superb! The members of this list never cease to
> amaze me, and your response is no exception.
>
> As an aside, does this type of problem come under a particular name or
> topic in mathematics?

Yes. Combinatorics. See

 	http://en.wikipedia.org/wiki/Combinatorics

and/or

 	http://www.amazon.com/Introductory-Combinatorics-Kenneth-P-Bogart/dp/0121108309/ref=pd_bbs_sr_2?ie=UTF8&s=books&qid=1214082041&sr=8-2

(or any other intro text)

HTH,

Chuck

>
> The ordering need not be exactly as I had it in my example --- that just
> represented the way I had tried to look at the problem using pen and
> paper (!)
>
> For the record and the archives, the following achieves a slightly
> different ordering to the one in my example, but which is equally usable
> for my application; ordered in terms of number of groups.
>
>> n <- 6
>> base10 <- seq(0, length=2^(n-1) )
>> base2.bits <- outer(0:(n-2), base10, function(y,x) ( x %/% (2^y)) %%2 )
>> res <- sapply(apply( base2.bits==1, 2, which ), function(x) rep(1:(1+length(x)), diff(c(0,x,n))))
>> ord <- order(apply(res, 2, function(x) length(unique(x))))
>> res[,ord]
>
> And your pointing me to choose(n-1, 0:(n-1)) as the basis to determine
> the number of possible combinations, allows me to break the result
> (res[,ord]) down into the chunks I need to work with.
>
> Many thanks,
>
> Gavin
>
> On Sat, 2008-06-21 at 11:30 -0700, Charles C. Berry wrote:
>> On Sat, 21 Jun 2008, Gavin Simpson wrote:
>>
>>> Dear List,
>>>
>>> I have a problem I'm finding it difficult to make headway with.
>>>
>>> Say I have 6 ordered observations, and I want to find all combinations
>>> of splitting these 6 ordered observations in g groups, where g = 1, ...,
>>> 6. Groups can only be formed by adjacent observations, so observations 1
>>> and 4 can't be in a group on their own, only if 1,2,3&4 are all in the
>>> group.
>>>
>>
>> Right. And in the example below there are 32 distinct patterns.
>>
>> Which arises from sum( choose( 5, 0:5 ) ) different placements of 0:5
>> split positions.
>>
>> You can represent the splits as a binary number with n-1 bits: 00000
>> implies no splits, 10000 implies a split between 1 and 2, 10100 implies
>> splits between 1 and 2 and between 3 and 4, et cetera.
>>
>> So, 32 arises as 2^5, too.
>>
>> Something like this:
>>
>>> base10 <- seq(0, length=2^(n-1) )
>>> base2.bits <- outer(0:(n-2), base10, function(y,x) ( x %/% (2^y)) %%2 )
>>> sapply(apply( base2.bits==1, 2, which ), function(x) rep(1:(1+length(x)), diff(c(0,x,n))))
>>
>> Getting this in the same column order as your example is left as an
>> exercise for the reader.
>>
>> HTH,
>>
>> Chuck
>>
>>> For example, with 6 observations, the columns of the matrices below
>>> represent the groups that can be formed by placing the 6 ordered
>>> observations into 2-5 groups. Think of the columns of these matrices as
>>> being an indicator of group membership. We then cbind these matrices
>>> with the trivial partitions into 1 and 6 groups:
>>>
>>> mat2g <- matrix(c(1,1,1,1,1,
>>>                  2,1,1,1,1,
>>>                  2,2,1,1,1,
>>>                  2,2,2,1,1,
>>>                  2,2,2,2,1,
>>>                  2,2,2,2,2),
>>>                nrow = 6, ncol = 5, byrow = TRUE)
>>>
>>> mat3g <- matrix(c(1,1,1,1,1,1,1,1,1,1,
>>>                  2,2,2,2,1,1,1,1,1,1,
>>>                  3,2,2,2,2,2,2,1,1,1,
>>>                  3,3,2,2,3,2,2,2,2,1,
>>>                  3,3,3,2,3,3,2,3,2,2,
>>>                  3,3,3,3,3,3,3,3,3,3),
>>>                nrow = 6, ncol = 10, byrow = TRUE)
>>>
>>> mat4g <- matrix(c(1,1,1,1,1,1,1,1,1,1,
>>>                  2,2,2,2,2,2,1,1,1,1,
>>>                  3,3,3,2,2,2,2,2,2,1,
>>>                  4,3,3,3,3,2,3,3,2,2,
>>>                  4,4,3,4,3,3,4,3,3,3,
>>>                  4,4,4,4,4,4,4,4,4,4),
>>>                nrow = 6, ncol = 10, byrow = TRUE)
>>>
>>> mat5g <- matrix(c(1,1,1,1,1,
>>>                  2,2,2,2,1,
>>>                  3,3,3,2,2,
>>>                  4,4,3,3,3,
>>>                  5,4,4,4,4,
>>>                  5,5,5,5,5),
>>>                nrow = 6, ncol = 5, byrow = TRUE)
>>>
>>> cbind(rep(1,6), mat2g, mat3g, mat4g, mat5g, 1:6)
>>>
>>> I'd like to be able to do this automagically, for any (reasonable,
>>> small, say n = 10-20) number of observations, n, and for g = 1, ..., n
>>> groups.
>>>
>>> I can't see the pattern here or a way forward. Can anyone suggest an
>>> approach?
>>>
>>> Thanks in advance,
>>>
>>> Gavin
>>>
>>> --
>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>>> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>>> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>>> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>>> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
>>> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                              Dept of Family/Preventive Medicine
>> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>>
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list