[R] use sliding window to count substrings found in large string

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 7 19:25:49 CEST 2010


On Wed, Jul 7, 2010 at 1:15 PM, Immanuel <mane.desk at googlemail.com> wrote:
> Hey,
>
> big help, thanks!
> One little question remains, if I create
> more then one string and table ...
> ---------------------
>
> # generate an input string n long
> set.seed(123)
> n <- 300
> lets_1 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
> lets_2 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
>
>
> # get rolling k-length sequences and count
> k <- 3
> table_1 <-table(substring(lets_1, 1:(n-k+1), k:n))
> table_2 <-table(substring(lets_2, 1:(n-k+1), k:n))
> -----------------------
>
> is it possible to manipulate table_1 so that it contains zero entries
> for all the substrings found in table_2 but not in table_1?
>
> best regards
> Immanuel
>

Turn them into factors with the appropriate levels before counting
them with table:

# generate an input string n long
set.seed(123)
n <- 300
lets_1 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
lets_2 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")

# get rolling k-length sequences and count
k <- 3
s1 <- substring(lets_1, 1:(n-k+1), k:n)
s2 <- substring(lets_2, 1:(n-k+1), k:n)
levs <- sort(unique(union(s1, s2)))
table(factors(s1, levs))
table(factors(s2, levs))



More information about the R-help mailing list