[R] Data parsing question: adding characters within a string of characters

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jan 2 13:55:22 CET 2014


On Wed, Jan 1, 2014 at 10:55 PM, Joshua Banta <jbanta at uttyler.edu> wrote:
> Dear Listserve,
>
> I have a data-parsing question for you. I recognize this is more in the domain of PERL/Python, but I don't know those languages! On the other hand, I am pretty good overall with R, so I'd rather get the job done within the R "ecosphere."
>
> Here is what I want to do. Consider the following data:
>
> string <- "ATCGCCCGTA[AGA]TAACCG"
>
> I want to alter string so that it looks like this:
>
> ATCGCCCGTA[A][G][A]TAACCG
>
> In other words, I want to design a piece of code that will scan a character string, find bracketed groups of characters, break up each character within the bracket into its own individual bracketed character, and then put the group of individually bracketed characters back into the character string. The lengths of the character strings enclosed by a bracket will vary, but in every case, I want to do the same thing: break up each character within the bracket into its own individual bracketed character, and then put the group of individually bracketed characters back into the character string.
>
> So, for example, another string may look like this:
>
> string2 <- "ATTATACGCA[AAATGCCCCA]GCTA[AT]GCATTA"
>
> I want to alter string so that it looks like this:
>
> "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"
>

Here is a one line solution:

library(gsubfn)
> gsubfn("\\[([^]]+)\\]", ~ paste(paste0("[", strsplit(x, "")[[1]], "]"), collapse = ""), string)
[1] "ATCGCCCGTA[A][G][A]TAACCG"
>
> gsubfn("\\[([^]]+)\\]", ~ paste(paste0("[", strsplit(x, "")[[1]], "]"), collapse = ""), string2)
[1] "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"




More information about the R-help mailing list