[R] Deleting subsequences from a string sequence

arun smartpink111 at yahoo.com
Thu Jan 23 20:04:48 CET 2014

 CDS1 <- read.table("CDS coordinates.txt",header=FALSE)
CDS2 <- split(CDS1[,1],as.numeric(as.character(gl(nrow(CDS1),2,length=nrow(CDS1)))))
eya4 <- readChar("eya4_lagan_HM_cp.txt",file.info("eya4_lagan_HM_cp.txt")$size)
 eyaSpl<- head(strsplit(eya4,"")[[1]],-1)
#[1] 311522

eyaSpl1 <- eyaSpl
for(i in seq_along(CDS2)){
eyaSpl1[seq(CDS2[[i]][1],CDS2[[i]][2],by=1)] <- "#"

 eyaSpl2 <- rep("#",sum(length(eyaSpl),length(CDS1[,1])))
vec1 <- unlist(lapply(CDS2,function(x) c(x[1]-1,x[2]+1)),use.names=FALSE)
 eyaSpl2[-vec1] <- eyaSpl
eyaSpl2New <- paste(eyaSpl2,collapse="")


I have a data file here, which is imported into R by: 

    eya4_lagan_HM_cp <- "E:/blahblah/eya4_lagan_HM_cp.txt" 
    eya4_lagan_HM_cp <- readChar(eya4_lagan_HM_cp, file.info(eya4_lagan_HM_cp)$size) 

Label the first string with position "1" and the last string
 as position "311,522" (note the sequence contains in total 311,522 
characters). I have two queries which are closely related. 

**Query 1)** 

Now I have a data file with a list of positions here. The positions are read in "pairs", that is, take the first pair 44184 
and 44216 as an example. I wish to delete the subsequence from position 
44184 (inclusive) to position 44216 (inclusive) from the previous 
sequence `eya4_lagan_HM_cp` and in its place, insert the character #. In other words, substitute the subsequence from 44184 to 44216 with #. I 
would like to do this with the rest of the pairs, that is, for 151795 
and 151844, I want to delete from position 151795 (inclusive) to 151844 
(inclusive) in `eya4_lagan_HM_cp` and replace it with #, and so on. 

**Query 2)** 

Now I would like to do something slightly different with the 
data file with the list of positions. Take the first pair as an example 
again. I would like to insert a # right before position 44184, in other words, insert a # between positions 44183 and 44184 in 
`eya4_lagan_HM_cp` and then I would like to insert a # right after position 44216, i.e., insert a # between positions 44216 and 44217. I would like to repeat this procedure for all position pairs. So for the next pair, I would like a # right before 151795 and a # right after 151844. 

Thank you. 

More information about the R-help mailing list