[R] "ACCTGMX" to "1223400" in R?

David Winsemius dwinsemius at comcast.net
Tue Jul 20 03:37:01 CEST 2010


On Jul 19, 2010, at 5:31 PM, John1983 wrote:

>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as  
> strings
> of A,C,T and G (also wild-character like M and X). I use the  
> Bioconductor
> package in R.

Well, I guess it's sort of a "meta" package, but it is really more of  
a subculture. It also has its own mailing list.

> Currently I need to convert a string of the form "ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and  
> any
> other character with a 0. I checked with 'replace' and also with a  
> function
> called 'copySubstitute' found in the Biobase package but this is  
> only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it to  
> yet
> another string ("1223400"). Now I use the strsplit function to split
> "ACCTGM" into "A" "C" "C" "T" "G" "M" and then use 'which' to assign  
> the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?

 > tst <- rep( "ACCTGMX", 5)
 > newtst <- gsub("A", "1", tst)
 > newtst <- gsub("C", "2", newtst)
 > newtst <- gsub("T", "3", newtst)
 > newtst <- gsub("G", "4", newtst)
 > newtst <- gsub("[[:alpha:]]", "0", newtst)
 > newtst
[1] "1223400" "1223400" "1223400" "1223400" "1223400"

There is also a rollaply function in teh zoo and an strapply function  
in the gsubfn package that might be even more powerful, but I am  
insufficiently talented to give you a one-liner using them.

>
> Please advise.
>
> Thank you.
> -- 
-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list