[R] Regexp subexpression

Gabor Grothendieck ggrothendieck at gmail.com
Sun Mar 26 06:08:58 CEST 2006


Here is yet another solution:

strsplit(sub(pat, '\\1 \\2', patid), split = " ")

or perhaps:

do.call("rbind", strsplit(sub(pat, '\\1 \\2', patid), split = " "))


On 3/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Here are some additional variations:
>
> > read.table(textConnection(sub(pat, '"\\1" "\\2"', patid)), as.is = TRUE)
>    V1  V2
> 1 ALAN 334
> 2  AzD  44
> 3       NA
>
> > read.table(textConnection(sub(pat, '"\\1" "\\2"', patid)), colClasses = "character")
>    V1  V2
> 1 ALAN 334
> 2  AzD  44
> 3
>
>
> Note that element 3,1 is the empty string and 3,2 is NA since the
> which occurs since the empty string is not numeric.
>
> On 3/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Here is one more variation. This time we provide an alternative .*
> > to soak up the entire expression when it would have otherwise
> > failed so that the substitution occurs regardless giving us
> > empty strings instead of the same string back:
> >
> > > pat = "^([[:alpha:]]+)([[:digit:]]+)|.*"
> > > sapply(sprintf("\\%d", 1:2), sub, pattern = pat, x = patid)
> >     \\1    \\2
> > [1,] "ALAN" "334"
> > [2,] "AzD"  "44"
> > [3,] ""     ""
> >
> > If NAs are needed, use the same result[regexpr(pat, patid) < 0,] <- NA
> > as last time.
> >
> > On 3/25/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > > We could use sapply to reduce it slightly:
> > >
> > > result <- sapply(sprintf("\\%d", 1:2), sub, pattern = pat, x = patid)
> > > result[regexpr(pat, patid) < 0,] <- NA
> > >
> > >
> > > On 3/25/06, Dieter Menne <dieter.menne at menne-biomed.de> wrote:
> > > > Gabor Grothendieck <ggrothendieck <at> gmail.com> writes:
> > > >
> > > > >
> > > > > In the third case there is no match so there are no
> > > > > substitutions.  Handle it separately:
> > > > >
> > > > > pat = "^([[:alpha:]]+)([[:digit:]]+)"
> > > > > result <- cbind(txt = sub(pat, "\\1", patid), num = sub(pat, "\\2", patid))
> > > > > result[regexpr(pat, paid) < 0,] <- NA
> > > > >
> > > >
> > > > Thanks, Gabor, that something like a compressed version of mine.  My main
> > > > question was if I was missing something obvious, because I found the double sub
> > > > messy. I am a surprised that there is not
> > > >
> > > > pat = "^([[:alpha:]]+)([[:digit:]]+)"
> > > > mygrep(pat, patid)
> > > >
> > > > returning a list with all subexpressions.
> > > >
> > > > Dieter
> > > >
> > > > ______________________________________________
> > > > R-help at stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> > > >
> > >
> >
>




More information about the R-help mailing list