[R] how to separate char and num within a variable

Marc Schwartz marc_schwartz at comcast.net
Fri Feb 6 00:39:53 CET 2009


on 02/05/2009 05:20 PM Bill Hyman wrote:
> Hi all,
> 
> I read in a column which looks like "chr1:000889594-000889638", and
> need to break them into three columns like "chr1:", "000889594" and
> "000889638". How shall I do in R. Thanks a lot for your suggestions!

See ?strsplit

Vec <- "chr1:000889594-000889638"

> Vec
[1] "chr1:000889594-000889638"

# Use a regular expression, defining the 'split' character
# as either ":" or "-", where the vertical bar means 'or':
> strsplit(Vec, split = ":|-")
[[1]]
[1] "chr1"      "000889594" "000889638"


Note that the split characters are not retained in the result.

Let's presume that you have a column in a data frame of the original
data and wish to split it into 3 columns:

DF <- data.frame(Col = rep(Vec, 10))

> DF
                        Col
1  chr1:000889594-000889638
2  chr1:000889594-000889638
3  chr1:000889594-000889638
4  chr1:000889594-000889638
5  chr1:000889594-000889638
6  chr1:000889594-000889638
7  chr1:000889594-000889638
8  chr1:000889594-000889638
9  chr1:000889594-000889638
10 chr1:000889594-000889638

Note that by default, 'Col' will be a factor and strsplit() expects a
character vector, thus we do the coercion and use do.call() to create a
character matrix, via rbind(), from the result:

> do.call(rbind, strsplit(as.character(DF$Col), split = ":|-"))
      [,1]   [,2]        [,3]
 [1,] "chr1" "000889594" "000889638"
 [2,] "chr1" "000889594" "000889638"
 [3,] "chr1" "000889594" "000889638"
 [4,] "chr1" "000889594" "000889638"
 [5,] "chr1" "000889594" "000889638"
 [6,] "chr1" "000889594" "000889638"
 [7,] "chr1" "000889594" "000889638"
 [8,] "chr1" "000889594" "000889638"
 [9,] "chr1" "000889594" "000889638"
[10,] "chr1" "000889594" "000889638"


See ?regex, ?do.call and ?rbind for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list