[R] splitting a vector of strings...

Jonathan Greenberg greenberg at ucdavis.edu
Fri Oct 23 06:50:41 CEST 2009


William et al:

    Thanks!  I think I have a somewhat more complicated issue due to the 
type of string I'm using -- the split is " | " (space pipe space) -- how 
do I code that based on your sub code below?  Using " | *" doesn't seem 
to be working.  Thanks!

--j

William Dunlap wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Greenberg
>> Sent: Thursday, October 22, 2009 7:35 PM
>> To: r-help
>> Subject: [R] splitting a vector of strings...
>>
>> Quick question -- if I have a vector of strings that I'd like 
>> to split 
>> into two new vectors based on a substring that is inside of 
>> each string, 
>> what is the most efficient way to do this?  The substring 
>> that I want to 
>> split on is multiple characters, if that matters, and it is 
>> contained in 
>> every element of the character vector.
>>     
>
> strsplit and sub can both be used for this.  If you know
> the string will be split into 2 parts then 2 calls to sub
> with slightly different patterns will do it.  strsplit requires
> less fiddling with the pattern and is handier when the number
> of parts is variable or large.  strsplit's output often needs to
> be rearranged for convenient use.
>
> E.g., I made 100,000 strings with a 'qaz' in their middles with
>   x<-paste("X",sample(1e5),sep="")
>   y<-sub("X","Y",x)
>   xy<-paste(x,y,sep="qaz")
> and split them by the 'qaz' in two ways:
>   system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
>   # user  system elapsed 
>   # 0.22    0.00    0.21 
>  
> system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
> 1)),y=unlist(lapply(tmp,`[`,2)))})
>    user  system elapsed 
>   # 2.42    0.00    2.20 
>   identical(ret1,ret2)
>   #[1] TRUE
>   identical(ret1$x,x) && identical(ret1$y,y)
>   #[1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>
>   
>> --j
>>
>> -- 
>>
>> Jonathan A. Greenberg, PhD
>> Postdoctoral Scholar
>> Center for Spatial Technologies and Remote Sensing (CSTARS)
>> University of California, Davis
>> One Shields Avenue
>> The Barn, Room 250N
>> Davis, CA 95616
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>     

-- 

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307




More information about the R-help mailing list