[R] non-isomorphic sequences

Petr Savicky savicky at cs.cas.cz
Mon Feb 13 20:53:39 CET 2012


On Mon, Feb 13, 2012 at 10:05:02AM -0800, zheng wei wrote:
> Dear All,
> 
> Sorry for the typoes earlier, let me repost the question.
> 
> Suppose I want to generate sequences of length 3 from two symbols {1,2}, we get the following 8 sequences
> 1 1 1
> 1 1 2
> 1 2 1
> 1 2 2
> 2 1 1
> 2 1 2
> 2 2 1
> 2 2 2
> 
> However, I do not want all these 8 sequences. I call two sequencs to be isomorphic if one sequence could be obtained from the other by relabelling the symbols. For example, 111 is isomorphic to 222, 112 is isomorphic to 221.?By eliminating all these isomorphic ones, what I want is the following
> 1 1 1
> 1 1 2
> 1 2 1
> 2 1 1

Eliminating isomorphic sequences may be done differently,
if we select different representatives of each equivalence
class. The following also eliminates isomorphic 1,2 sequences

  1 1 1
  1 1 2
  1 2 1
  1 2 2

Is this solution OK?

> In general, I need to generate non-isomorphic sequences of length p from t distinct symbols. For example, when p=3, t=3 we have
> matrix(c(1,2,3,1,1,2,2,1,1,1,2,1,1,1,1),3,5)
> 
> [1,]??? 1??? 1??? 2??? 1??? 1
> [2,]??? 2??? 1??? 1??? 2??? 1
> [3,]??? 3??? 2??? 1??? 1??? 1
> 
> When p=4, t=4 we have
> matrix(c(1,2,3,4,1,1,2,3,1,2,1,3,1,2,3,1,2,1,1,3,2,3,1,1,2,1,3,1,1,1,2,2,1,2,1,2,1,2,2,1,1,1,1,2,1,1,2,1,1,2,1,1,2,1,1,1,1,1,1,1),4,15)
> 
> [1,]??? 1??? 1??? 1??? 1??? 2??? 2??? 2??? 1??? 1???? 1???? 1???? 1???? 1???? 2???? 1
> [2,]??? 2??? 1??? 2??? 2??? 1??? 3??? 1??? 1??? 2???? 2???? 1???? 1???? 2???? 1???? 1
> [3,]??? 3??? 2??? 1??? 3??? 1??? 1??? 3??? 2??? 1???? 2???? 1???? 2???? 1???? 1???? 1
> [4,]??? 4??? 3??? 3??? 1??? 3??? 1??? 1??? 2??? 2???? 1???? 2???? 1???? 1???? 1???? 1
> 
> 
> In general, I need to do this for arbitrary choices of p and t.

If p and t are not too large, try the following

  check.row <- function(x)
  {
      y <- unique(x)
      all(y == seq.int(along=y))
  }

  p <- 4
  tt <- 4 # do not rewrite t() for transpose
  elem <- lapply(as.list(pmin(1:p, tt)), function(x) seq.int(length=x))
  a <- as.matrix(rev(expand.grid(rev(elem))))
  ok <- apply(a, 1, check.row)
  out <- a[ok, ]
  out

        Var4 Var3 Var2 Var1
   [1,]    1    1    1    1
   [2,]    1    1    1    2
   [3,]    1    1    2    1
   [4,]    1    1    2    2
   [5,]    1    1    2    3
   [6,]    1    2    1    1
   [7,]    1    2    1    2
   [8,]    1    2    1    3
   [9,]    1    2    2    1
  [10,]    1    2    2    2
  [11,]    1    2    2    3
  [12,]    1    2    3    1
  [13,]    1    2    3    2
  [14,]    1    2    3    3
  [15,]    1    2    3    4

This solution differs from yours, for example, in
the row c(1, 2, 3, 3), which is in your solution
represented by c(2, 3, 1, 1). This a different choice
of the representatives. Is the choice important?

A related thread started at

  https://stat.ethz.ch/pipermail/r-help/2012-January/301489.html

There was an additional requirement that each of t symbols
has at least one occurrence.

Hope this helps.

Petr Savicky.



More information about the R-help mailing list