[R] Regexp subexpression

Dieter Menne dieter.menne at menne-biomed.de
Sat Mar 25 17:22:52 CET 2006


I can't get the PERL subexpression translated to R. Following, for example,
B. Ripley's

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/58984.html

I am using sub, but it looks like an ugly substitute. Assume I want to
extract the first alpha part and the first numeric part, but only if they
are in sequence.

Do I really have to use the sub twice, first extracting the first variable,
then the second? The third example should return nothing, because it's
inverted, but it returns the whole string. I know I could check that
separately, but is there no better way?

  patid=c("ALAN334","AzD44","44AZD")
  txt =sub("([[:alpha:]]+)([[:digit:]])+","\\1",patid)
  num =sub("([[:alpha:]]+)([[:digit:]])+","\\2",patid)

It would be nice if the following data frame would be returned:

txt     num
ALAN    334
AzD     44
NA      NA (or "", "", but not so nice)

Dieter




More information about the R-help mailing list