[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Tue Jun 9 00:27:08 CEST 2009


On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<RTan at panagora.com> wrote:
> Hi,
>
> This is not exactly an R question but I am trying to use gsub to replace
> a string that contains 5-9 alpha-numeric characters, at least one of
> which is a number.  Is there a good way to write it in a one line regex?

 The only way I can think of is to spell out all the possible
expressions, somethinglike:

[0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} |
[a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex
component for every possible 5, 6, 7, 8, and 9 character expression
with [0-9] in each place. I'm not sure this qualifies as 'good',
though..

 Better to do it in two stages, one to check for 5-9 alphanumerics,
and then another to check for a number.

Here's something on a test vector 's':

> cbind(s,grepl("^[A-Z0-9]{5,9}$",s),grepl("[0-9]",s))
     s
[1,] "SHRT"        "FALSE" "FALSE"
[2,] "5HRT"        "FALSE" "TRUE"
[3,] "M1TCH"       "TRUE"  "TRUE"
[4,] "M1TCH5"      "TRUE"  "TRUE"
[5,] "LONG3RS"     "TRUE"  "TRUE"
[6,] "NONUMBER"    "TRUE"  "FALSE"
[7,] "TOOLOOOONGG" "FALSE" "FALSE"

 The ones you want give two TRUE values. Extending to lower-case is
left as an exercise...

Barry




More information about the R-help mailing list