[R] splitting a vector of strings...

andrew andrewjohnroyal at gmail.com
Fri Oct 23 06:58:20 CEST 2009


the following works - double backslash to remove the "or"
functionality of | in a regex.  (Bill Dunlap showed that you don't
need sapply for it to work)

xs <- "this is | string"
xsv <- paste(xs, 1:10)
strsplit(xsv, "\\|")


On Oct 23, 3:50 pm, Jonathan Greenberg <greenb... at ucdavis.edu> wrote:
> William et al:
>
>     Thanks!  I think I have a somewhat more complicated issue due to the
> type of string I'm using -- the split is " | " (space pipe space) -- how
> do I code that based on your sub code below?  Using " | *" doesn't seem
> to be working.  Thanks!
>
> --j
>
>
>
> William Dunlap wrote:
> >> -----Original Message-----
> >> From: r-help-boun... at r-project.org
> >> [mailto:r-help-boun... at r-project.org] On Behalf Of Jonathan Greenberg
> >> Sent: Thursday, October 22, 2009 7:35 PM
> >> To: r-help
> >> Subject: [R] splitting a vector of strings...
>
> >> Quick question -- if I have a vector of strings that I'd like
> >> to split
> >> into two new vectors based on a substring that is inside of
> >> each string,
> >> what is the most efficient way to do this?  The substring
> >> that I want to
> >> split on is multiple characters, if that matters, and it is
> >> contained in
> >> every element of the character vector.
>
> > strsplit and sub can both be used for this.  If you know
> > the string will be split into 2 parts then 2 calls to sub
> > with slightly different patterns will do it.  strsplit requires
> > less fiddling with the pattern and is handier when the number
> > of parts is variable or large.  strsplit's output often needs to
> > be rearranged for convenient use.
>
> > E.g., I made 100,000 strings with a 'qaz' in their middles with
> >   x<-paste("X",sample(1e5),sep="")
> >   y<-sub("X","Y",x)
> >   xy<-paste(x,y,sep="qaz")
> > and split them by the 'qaz' in two ways:
> >   system.time(ret1<-list(x=sub("qaz.*","",xy),y=sub(".*qaz","",xy)))
> >   # user  system elapsed
> >   # 0.22    0.00    0.21
>
> > system.time({tmp<-strsplit(xy,"qaz");ret2<-list(x=unlist(lapply(tmp,`[`,
> > 1)),y=unlist(lapply(tmp,`[`,2)))})
> >    user  system elapsed
> >   # 2.42    0.00    2.20
> >   identical(ret1,ret2)
> >   #[1] TRUE
> >   identical(ret1$x,x) && identical(ret1$y,y)
> >   #[1] TRUE
>
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
>
> >> --j
>
> >> --
>
> >> Jonathan A. Greenberg, PhD
> >> Postdoctoral Scholar
> >> Center for Spatial Technologies and Remote Sensing (CSTARS)
> >> University of California, Davis
> >> One Shields Avenue
> >> The Barn, Room 250N
> >> Davis, CA 95616
> >> Phone: 415-763-5476
> >> AIM: jgrn307, MSN: jgrn... at hotmail.com, Gchat: jgrn307
>
> >> ______________________________________________
> >> R-h... at r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn... at hotmail.com, Gchat: jgrn307
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list