[R] use sliding window to count substrings found in large string

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jul 7 19:26:52 CEST 2010


On Wed, Jul 7, 2010 at 1:25 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Wed, Jul 7, 2010 at 1:15 PM, Immanuel <mane.desk at googlemail.com> wrote:
>> Hey,
>>
>> big help, thanks!
>> One little question remains, if I create
>> more then one string and table ...
>> ---------------------
>>
>> # generate an input string n long
>> set.seed(123)
>> n <- 300
>> lets_1 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
>> lets_2 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
>>
>>
>> # get rolling k-length sequences and count
>> k <- 3
>> table_1 <-table(substring(lets_1, 1:(n-k+1), k:n))
>> table_2 <-table(substring(lets_2, 1:(n-k+1), k:n))
>> -----------------------
>>
>> is it possible to manipulate table_1 so that it contains zero entries
>> for all the substrings found in table_2 but not in table_1?
>>
>> best regards
>> Immanuel
>>
>
> Turn them into factors with the appropriate levels before counting
> them with table:
>
> # generate an input string n long
> set.seed(123)
> n <- 300
> lets_1 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
> lets_2 <- paste(sample(letters[1:5], n, replace = TRUE), collapse = "")
>
> # get rolling k-length sequences and count
> k <- 3
> s1 <- substring(lets_1, 1:(n-k+1), k:n)
> s2 <- substring(lets_2, 1:(n-k+1), k:n)
> levs <- sort(unique(union(s1, s2)))
> table(factors(s1, levs))
> table(factors(s2, levs))
>

That should be factor, not factors:

table(factor(s1, levs))
table(factor(s2, levs))



More information about the R-help mailing list