[R] interval between specific characters in a string...

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sat Dec 3 05:01:44 CET 2022


Evan, there are oodles of ways to do many things in R, and mcu of what the
tidyverse supplies can often be done as easily, or easier, outside it.

Before presenting a solution, I need to make sure I am answering the same
question or problem you intend.

Here is the string you have as an example:

st <- "abaaabbaaaaabaaab"

Is the string a string testing for single characters called "b" with any
other characters being either just "a" or at least non-"b" and of any length
but at least a few?

If so, ONE METHOD is to convert the string to a vector for reasons that will
become clear. For oddball reasons, this is a way to do it:

> unlist(strsplit(st,""))
[1] "a" "b" "a" "a" "a" "b" "b" "a" "a" "a" "a" "a" "b" "a" "a" "a" "b"

The result is a vector you can examine to see if they are equal to "b" or
not as a TRUE/FALSE vector:

> unlist(strsplit(st,"")) == "b"
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE
[17]  TRUE

Now you can ask for the indices which are TRUE, meaning at what offset from
the beginning are there instances of the letter "b":

> which(unlist(strsplit(st,"")) == "b")
[1]  2  6  7 13 17

This shows the second the integer offsets for the letter "b" are the second,
sixth and so on to seventeenth. Again, if I understood you, you want a
measure of how far apart instances of "b" are with adjacent ones being 1
apart. Again, many methods but I chose one where I sort of slid over the
above values by sliding in a zero from the front and removing the last
entry. 

So save that in a variable  first:

indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))

The two contain:

> indices
[1]  2  6  7 13 17
> indices_shifted
[1]  0  2  6  7 13
> indices - indices_shifted 
[1] 2 4 1 6 4

The above is the same as your intended result.

If you want to be cautious, handle edge cases like not having any "b" or an
empty string.

Here is the consolidated code:

st <- "abaaabbaaaaabaaab"
indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))
result <- indices - indices_shifted

There are many other ways to do this and of course some are more
straightforward and some more complex.

Consider a loop using a vector version of the string where each time you see
a b", you remember the last index you saw it and put out the number
representing the gap.

Fairly low tech.


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Evan Cooch
Sent: Friday, December 2, 2022 12:19 PM
To: r-help using r-project.org
Subject: [R] interval between specific characters in a string...

Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up in
the string, but...what I'm looking for is outputing the 'intervals' 
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something like
unlist(gregexpr('b', target_string))), and 'do the math' between successive
positions. Can anyone suggest a more elegant approach?

Thanks in advance...

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list