[R] "ACCTGMX" to "1223400" in R?
    David Winsemius 
    dwinsemius at comcast.net
       
    Tue Jul 20 03:37:01 CEST 2010
    
    
  
On Jul 19, 2010, at 5:31 PM, John1983 wrote:
>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as  
> strings
> of A,C,T and G (also wild-character like M and X). I use the  
> Bioconductor
> package in R.
Well, I guess it's sort of a "meta" package, but it is really more of  
a subculture. It also has its own mailing list.
> Currently I need to convert a string of the form "ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and  
> any
> other character with a 0. I checked with 'replace' and also with a  
> function
> called 'copySubstitute' found in the Biobase package but this is  
> only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it to  
> yet
> another string ("1223400"). Now I use the strsplit function to split
> "ACCTGM" into "A" "C" "C" "T" "G" "M" and then use 'which' to assign  
> the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
 > tst <- rep( "ACCTGMX", 5)
 > newtst <- gsub("A", "1", tst)
 > newtst <- gsub("C", "2", newtst)
 > newtst <- gsub("T", "3", newtst)
 > newtst <- gsub("G", "4", newtst)
 > newtst <- gsub("[[:alpha:]]", "0", newtst)
 > newtst
[1] "1223400" "1223400" "1223400" "1223400" "1223400"
There is also a rollaply function in teh zoo and an strapply function  
in the gsubfn package that might be even more powerful, but I am  
insufficiently talented to give you a one-liner using them.
>
> Please advise.
>
> Thank you.
> -- 
-- 
David Winsemius, MD
West Hartford, CT
    
    
More information about the R-help
mailing list