[R] String processing - is there a better way

Davis, Brian Brian.Davis at uth.tmc.edu
Wed Jul 21 19:02:58 CEST 2010


I have a two part question

Part 1) 
I am trying to remove characters in a string based on the position of a key character in another string.  I have a solution that works but it requires a for-loop.  A vectorized way of doing this has alluded me.  

CleanRead<-function(x,y) {

  if (!is.character(x)) 
    x <- as.character(x)
  if (!is.character(y)) 
    y <- as.character(y)

  idx<-grep("\\*", x, value=FALSE)
  starpos<-gregexpr("\\*", x[idx])
  
  ysplit<-strsplit(y[idx], '')
  n<-length(idx)
  for(i in 1:n) {
    ysplit[[i]][starpos[[i]]] = ""
  }

  y[idx]<-unlist(lapply(ysplit, paste, sep='', collapse=''))
  return(y)
}

x<-c("AA*.*A,,,", "**a.a*,,,A", "C*c..", "**aA") 
y<-c("abcdefghi", "abcdefghij", "abcde", "abcd")

CleanRead(x,y)
[1] "abdfghi" "cdeghij" "acde"    "cd"


Is there a better way to do this?

Part 2) 
My next step in the string processing is to take the characters in the output of CleanRead and subtract 33 from the ascii value of the character to obtain an integer. Again I have a solution that works, involving splitting the string into characters then converting them to factors (starting at ascii 34) and using unclass to get the integer value. (kindof a atoi(x)-33 all in one step)

I looked for the C equivalent of atoi, but the only help I could find (R-help 2003) suggested using as.numeric.  However, the help file (and testing) shows you get 'NA'.   

Am I missing an easier way to do this?



Thanks in advance,

Brian



More information about the R-help mailing list