[R] Searching for a pattern within a vector

Petr Savicky savicky at cs.cas.cz
Fri Feb 24 09:19:43 CET 2012


On Fri, Feb 24, 2012 at 01:00:00PM +0530, Apoorva Gupta wrote:
> Dear R users,
> 
> I have a data.frame as follows
> 
>      a b c d e
>  [1,] 1 1 1 0 0
>  [2,] 1 1 0 0 0
>  [3,] 1 1 0 0 0
>  [4,] 0 1 1 1 1
>  [5,] 0 1 1 1 1
>  [6,] 1 1 1 1 1
>  [7,] 1 1 1 0 1
>  [8,] 1 1 1 0 1
>  [9,] 1 1 1 0 0
> [10,] 1 1 1 0 0
> 
> Within these 4 vectors, I want to choose those vectors for which I
> have the pattern (0,0,1,1,1,1) occuring anywhere in the vector.
> This means I want vectors a,c,e and not b and d.

Hi.

A related thread was 

  [R] matching a sequence in a vector?

which started at

  https://stat.ethz.ch/pipermail/r-help/2012-February/303608.html
  https://stat.ethz.ch/pipermail/r-help/attachments/20120215/989a2e88/attachment.pl

and a summary of suggested solutions was at

  https://stat.ethz.ch/pipermail/r-help/2012-February/303756.html

Try the following, where any of the functions occur* described there
may be used instead of occur1. The original function returned the
vector "candidate" of the indices, where an occurence of "patrn"
in "exmpl" starts. For your purposes, the function has to be modified
in two directions.

  1. The output is the condition length(candidate) != 0 instead of "candidate".
  2. The argument "exmpl" is the first argument.

  # your data frame
  df <- structure(list(a = c(1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), 
    b = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), c = c(1L, 0L, 0L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L), d = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L),
    e = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L)), .Names = c("a", "b", "c",
    "d", "e"), class = "data.frame", row.names = c(NA, -10L))

  # modified function occur1
  testoccur1 <- function(exmpl, patrn)
  {
    m <- length(patrn)
    n <- length(exmpl)
    candidate <- seq.int(length=n-m+1)
    for (i in seq.int(length=m)) {
        candidate <- candidate[patrn[i] == exmpl[candidate + i - 1]]
    }
    length(candidate) != 0
  }

  selection <- unlist(lapply(df, testoccur1, patrn=c(0,0,1,1,1,1)))
  selection 

      a     b     c     d     e 
   TRUE FALSE  TRUE FALSE  TRUE 

  df[, selection]

     a c e
  1  1 1 0
  2  1 0 0
  3  1 0 0
  4  0 1 1
  5  0 1 1
  6  1 1 1
  7  1 1 1
  8  1 1 1
  9  1 1 0
  10 1 1 0

In your post, you printed not a data frame, but a matrix. If your
structure is a matrix, try the following

  # your matrix
  mat <- as.matrix(df)
  mat

        a b c d e
   [1,] 1 1 1 0 0
   [2,] 1 1 0 0 0
   [3,] 1 1 0 0 0
   [4,] 0 1 1 1 1
   [5,] 0 1 1 1 1
   [6,] 1 1 1 1 1
   [7,] 1 1 1 0 1
   [8,] 1 1 1 0 1
   [9,] 1 1 1 0 0
  [10,] 1 1 1 0 0

  # selection of columns
  sel <- apply(mat, 2, testoccur1, patrn=c(0,0,1,1,1,1))
  mat[, sel]

        a c e
   [1,] 1 1 0
   [2,] 1 0 0
   [3,] 1 0 0
   [4,] 0 1 1
   [5,] 0 1 1
   [6,] 1 1 1
   [7,] 1 1 1
   [8,] 1 1 1
   [9,] 1 1 0
  [10,] 1 1 0

Hope this helps.

Petr Savicky.



More information about the R-help mailing list