[R] Generating groupings of ordered observations

Gavin Simpson gavin.simpson at ucl.ac.uk
Sat Jun 21 21:30:02 CEST 2008


Dear Chuck,

That is, quite simply, superb! The members of this list never cease to
amaze me, and your response is no exception.

As an aside, does this type of problem come under a particular name or
topic in mathematics?

The ordering need not be exactly as I had it in my example --- that just
represented the way I had tried to look at the problem using pen and
paper (!)

For the record and the archives, the following achieves a slightly
different ordering to the one in my example, but which is equally usable
for my application; ordered in terms of number of groups.

> n <- 6
> base10 <- seq(0, length=2^(n-1) )
> base2.bits <- outer(0:(n-2), base10, function(y,x) ( x %/% (2^y)) %%2 )
> res <- sapply(apply( base2.bits==1, 2, which ), function(x) rep(1:(1+length(x)), diff(c(0,x,n))))
> ord <- order(apply(res, 2, function(x) length(unique(x))))
> res[,ord]

And your pointing me to choose(n-1, 0:(n-1)) as the basis to determine
the number of possible combinations, allows me to break the result
(res[,ord]) down into the chunks I need to work with.

Many thanks,

Gavin

On Sat, 2008-06-21 at 11:30 -0700, Charles C. Berry wrote:
> On Sat, 21 Jun 2008, Gavin Simpson wrote:
> 
> > Dear List,
> >
> > I have a problem I'm finding it difficult to make headway with.
> >
> > Say I have 6 ordered observations, and I want to find all combinations
> > of splitting these 6 ordered observations in g groups, where g = 1, ...,
> > 6. Groups can only be formed by adjacent observations, so observations 1
> > and 4 can't be in a group on their own, only if 1,2,3&4 are all in the
> > group.
> >
> 
> Right. And in the example below there are 32 distinct patterns.
> 
> Which arises from sum( choose( 5, 0:5 ) ) different placements of 0:5 
> split positions.
> 
> You can represent the splits as a binary number with n-1 bits: 00000 
> implies no splits, 10000 implies a split between 1 and 2, 10100 implies 
> splits between 1 and 2 and between 3 and 4, et cetera.
> 
> So, 32 arises as 2^5, too.
> 
> Something like this:
> 
> > base10 <- seq(0, length=2^(n-1) )
> > base2.bits <- outer(0:(n-2), base10, function(y,x) ( x %/% (2^y)) %%2 )
> > sapply(apply( base2.bits==1, 2, which ), function(x) rep(1:(1+length(x)), diff(c(0,x,n))))
> 
> Getting this in the same column order as your example is left as an 
> exercise for the reader.
> 
> HTH,
> 
> Chuck
> 
> > For example, with 6 observations, the columns of the matrices below
> > represent the groups that can be formed by placing the 6 ordered
> > observations into 2-5 groups. Think of the columns of these matrices as
> > being an indicator of group membership. We then cbind these matrices
> > with the trivial partitions into 1 and 6 groups:
> >
> > mat2g <- matrix(c(1,1,1,1,1,
> >                  2,1,1,1,1,
> >                  2,2,1,1,1,
> >                  2,2,2,1,1,
> >                  2,2,2,2,1,
> >                  2,2,2,2,2),
> >                nrow = 6, ncol = 5, byrow = TRUE)
> >
> > mat3g <- matrix(c(1,1,1,1,1,1,1,1,1,1,
> >                  2,2,2,2,1,1,1,1,1,1,
> >                  3,2,2,2,2,2,2,1,1,1,
> >                  3,3,2,2,3,2,2,2,2,1,
> >                  3,3,3,2,3,3,2,3,2,2,
> >                  3,3,3,3,3,3,3,3,3,3),
> >                nrow = 6, ncol = 10, byrow = TRUE)
> >
> > mat4g <- matrix(c(1,1,1,1,1,1,1,1,1,1,
> >                  2,2,2,2,2,2,1,1,1,1,
> >                  3,3,3,2,2,2,2,2,2,1,
> >                  4,3,3,3,3,2,3,3,2,2,
> >                  4,4,3,4,3,3,4,3,3,3,
> >                  4,4,4,4,4,4,4,4,4,4),
> >                nrow = 6, ncol = 10, byrow = TRUE)
> >
> > mat5g <- matrix(c(1,1,1,1,1,
> >                  2,2,2,2,1,
> >                  3,3,3,2,2,
> >                  4,4,3,3,3,
> >                  5,4,4,4,4,
> >                  5,5,5,5,5),
> >                nrow = 6, ncol = 5, byrow = TRUE)
> >
> > cbind(rep(1,6), mat2g, mat3g, mat4g, mat5g, 1:6)
> >
> > I'd like to be able to do this automagically, for any (reasonable,
> > small, say n = 10-20) number of observations, n, and for g = 1, ..., n
> > groups.
> >
> > I can't see the pattern here or a way forward. Can anyone suggest an
> > approach?
> >
> > Thanks in advance,
> >
> > Gavin
> >
> > -- 
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> > Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> Charles C. Berry                            (858) 534-2098
>                                              Dept of Family/Preventive Medicine
> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
> 
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list