[R] how to separate char and num within a variable

Bill Hyman billhyman1 at yahoo.com
Fri Feb 6 01:46:47 CET 2009


Thx a lot!



----- Original Message ----
From: Marc Schwartz <marc_schwartz at comcast.net>
To: Bill Hyman <billhyman1 at yahoo.com>
Cc: r-help at r-project.org
Sent: Thursday, February 5, 2009 3:39:53 PM
Subject: Re: [R] how to separate char and num within a variable

on 02/05/2009 05:20 PM Bill Hyman wrote:
> Hi all,
> 
> I read in a column which looks like "chr1:000889594-000889638", and
> need to break them into three columns like "chr1:", "000889594" and
> "000889638". How shall I do in R. Thanks a lot for your suggestions!

See ?strsplit

Vec <- "chr1:000889594-000889638"

> Vec
[1] "chr1:000889594-000889638"

# Use a regular expression, defining the 'split' character
# as either ":" or "-", where the vertical bar means 'or':
> strsplit(Vec, split = ":|-")
[[1]]
[1] "chr1"      "000889594" "000889638"


Note that the split characters are not retained in the result.

Let's presume that you have a column in a data frame of the original
data and wish to split it into 3 columns:

DF <- data.frame(Col = rep(Vec, 10))

> DF
                        Col
1  chr1:000889594-000889638
2  chr1:000889594-000889638
3  chr1:000889594-000889638
4  chr1:000889594-000889638
5  chr1:000889594-000889638
6  chr1:000889594-000889638
7  chr1:000889594-000889638
8  chr1:000889594-000889638
9  chr1:000889594-000889638
10 chr1:000889594-000889638

Note that by default, 'Col' will be a factor and strsplit() expects a
character vector, thus we do the coercion and use do.call() to create a
character matrix, via rbind(), from the result:

> do.call(rbind, strsplit(as.character(DF$Col), split = ":|-"))
      [,1]   [,2]        [,3]
[1,] "chr1" "000889594" "000889638"
[2,] "chr1" "000889594" "000889638"
[3,] "chr1" "000889594" "000889638"
[4,] "chr1" "000889594" "000889638"
[5,] "chr1" "000889594" "000889638"
[6,] "chr1" "000889594" "000889638"
[7,] "chr1" "000889594" "000889638"
[8,] "chr1" "000889594" "000889638"
[9,] "chr1" "000889594" "000889638"
[10,] "chr1" "000889594" "000889638"


See ?regex, ?do.call and ?rbind for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list