[R] Data parsing question: adding characters within a string of characters

Frede Aakmann Tøgersen frtog at vestas.com
Thu Jan 2 12:19:08 CET 2014


Hi Joshua

This is one way to do it. Not sure if it this is an efficient implementation for your needs; it depends on the size of your data.


string1 <- "ATCGCCCGTA[AGA]TAACCG"
string2 <- "ATTATACGCA[AAATGCCCCA]GCTA[AT]GCATTA"

foo <- function(genes){

    mypaste <- function(x) paste("[", paste(x, collapse = "]["), "]", sep = "")

    tmp <- strsplit(genes, "[[:punct:]]")[[1]]
    str <- gregexpr("\\[", genes)[[1]]
    stp <- gregexpr("\\]", genes)[[1]]
    tmp2 <- substring(genes,  str + 1, stp - 1)

    ndx <- match(tmp2, tmp)
    tmp[ndx] <- lapply(strsplit(tmp2, ""), mypaste)
    result <- paste(tmp, collapse = "")

    return(result)
}

> foo(string2)
[1] "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"
> foo(string1)
[1] "ATCGCCCGTA[A][G][A]TAACCG"
>

Yours sincerely / Med venlig hilsen


Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling

Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender. 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Joshua Banta
> Sent: 2. januar 2014 04:56
> To: R Help
> Subject: [R] Data parsing question: adding characters within a string of
> characters
> 
> Dear Listserve,
> 
> I have a data-parsing question for you. I recognize this is more in the domain
> of PERL/Python, but I don't know those languages! On the other hand, I am
> pretty good overall with R, so I'd rather get the job done within the R
> "ecosphere."
> 
> Here is what I want to do. Consider the following data:
> 
> string <- "ATCGCCCGTA[AGA]TAACCG"
> 
> I want to alter string so that it looks like this:
> 
> ATCGCCCGTA[A][G][A]TAACCG
> 
> In other words, I want to design a piece of code that will scan a character
> string, find bracketed groups of characters, break up each character within
> the bracket into its own individual bracketed character, and then put the
> group of individually bracketed characters back into the character string. The
> lengths of the character strings enclosed by a bracket will vary, but in every
> case, I want to do the same thing: break up each character within the bracket
> into its own individual bracketed character, and then put the group of
> individually bracketed characters back into the character string.
> 
> So, for example, another string may look like this:
> 
> string2 <- "ATTATACGCA[AAATGCCCCA]GCTA[AT]GCATTA"
> 
> I want to alter string so that it looks like this:
> 
> "ATTATACGCA[A][A][A][T][G][C][C][C][C][A]GCTA[A][T]GCATTA"
> 
> Thank you all in advance and have a great 2014!
> 
> -----------------------------------
> Josh Banta, Ph.D
> Assistant Professor
> Department of Biology
> The University of Texas at Tyler
> Tyler, TX 75799
> Tel: (903) 565-5655
> http://plantevolutionaryecology.org
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list