[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

Marc Schwartz marc_schwartz at me.com
Tue Jun 9 02:32:33 CEST 2009


On Jun 8, 2009, at 5:27 PM, Barry Rowlingson wrote:

> On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<RTan at panagora.com>  
> wrote:
>> Hi,
>>
>> This is not exactly an R question but I am trying to use gsub to  
>> replace
>> a string that contains 5-9 alpha-numeric characters, at least one of
>> which is a number.  Is there a good way to write it in a one line  
>> regex?
>
> The only way I can think of is to spell out all the possible
> expressions, somethinglike:
>
> [0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} |
> [a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex
> component for every possible 5, 6, 7, 8, and 9 character expression
> with [0-9] in each place. I'm not sure this qualifies as 'good',
> though..
>
> Better to do it in two stages, one to check for 5-9 alphanumerics,
> and then another to check for a number.
>
> Here's something on a test vector 's':
>
>> cbind(s,grepl("^[A-Z0-9]{5,9}$",s),grepl("[0-9]",s))
>     s
> [1,] "SHRT"        "FALSE" "FALSE"
> [2,] "5HRT"        "FALSE" "TRUE"
> [3,] "M1TCH"       "TRUE"  "TRUE"
> [4,] "M1TCH5"      "TRUE"  "TRUE"
> [5,] "LONG3RS"     "TRUE"  "TRUE"
> [6,] "NONUMBER"    "TRUE"  "FALSE"
> [7,] "TOOLOOOONGG" "FALSE" "FALSE"
>
> The ones you want give two TRUE values. Extending to lower-case is
> left as an exercise...
>
> Barry


I was trying to think of a way to do this with only a single grep(),  
but it has been too long of a day.

So here is a bit of a simplification on the two stage approach:

 > vec
[1] "SHRT"        "5HRT"        "M1TCH"       "M1TCH5"       
"LONG3RS"     "NONUMBER"    "TOOLOOOONGG"


 > grep("[0-9]", vec[grep("^[[:alnum:]]{5,9}$", vec)], value = TRUE)
[1] "M1TCH"   "M1TCH5"  "LONG3RS"


HTH,

Marc Schwartz




More information about the R-help mailing list