[R] Regexp subexpression

Gabor Grothendieck ggrothendieck at gmail.com
Sat Mar 25 18:12:39 CET 2006


In the third case there is no match so there are no
substitutions.  Handle it separately:

pat = "^([[:alpha:]]+)([[:digit:]]+)"
result <- cbind(txt = sub(pat, "\\1", patid), num = sub(pat, "\\2", patid))
result[regexpr(pat, paid) < 0,] <- NA


On 3/25/06, Dieter Menne <dieter.menne at menne-biomed.de> wrote:
> I can't get the PERL subexpression translated to R. Following, for example,
> B. Ripley's
>
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/58984.html
>
> I am using sub, but it looks like an ugly substitute. Assume I want to
> extract the first alpha part and the first numeric part, but only if they
> are in sequence.
>
> Do I really have to use the sub twice, first extracting the first variable,
> then the second? The third example should return nothing, because it's
> inverted, but it returns the whole string. I know I could check that
> separately, but is there no better way?
>
>  patid=c("ALAN334","AzD44","44AZD")
>  txt =sub("([[:alpha:]]+)([[:digit:]])+","\\1",patid)
>  num =sub("([[:alpha:]]+)([[:digit:]])+","\\2",patid)
>
> It would be nice if the following data frame would be returned:
>
> txt     num
> ALAN    334
> AzD     44
> NA      NA (or "", "", but not so nice)
>
> Dieter
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list