[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 9 09:53:57 CEST 2009


Wacek Kusnierczyk wrote:
> Marc Schwartz wrote:
>   
>> On Jun 8, 2009, at 5:27 PM, Barry Rowlingson wrote:
>>
>>     
>>> On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<RTan at panagora.com> wrote:
>>>       
>>>> Hi,
>>>>
>>>> This is not exactly an R question but I am trying to use gsub to
>>>> replace
>>>> a string that contains 5-9 alpha-numeric characters, at least one of
>>>> which is a number.  Is there a good way to write it in a one line
>>>> regex?
>>>>         
>>> The only way I can think of is to spell out all the possible
>>> expressions, somethinglike:
>>>
>>> [0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} |
>>> [a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex
>>> component for every possible 5, 6, 7, 8, and 9 character expression
>>> with [0-9] in each place. I'm not sure this qualifies as 'good',
>>> though..
>>>       
>
> something like this?
>
>     input = c(
>         none='0foo f0oo foo0 foo00 f0o0o foofoofoo0 0foofoofoo',
>         all='foob0 foo0b 0foob 0foobardo foob4rdoo foobardo0')
>
>     gsub(x=input, replacement='x', perl=TRUE,
>         pattern=paste(collapse='|',
>            
> sprintf('\\b[[:alpha:]-]{%d}[[:digit:]][[:alpha:]]{%d,%d}\\b',

of course it should have been (no minus):

    '\\b[[:alpha:]]{%d}[[:digit:]][[:alpha:]]{%d,%d}\\b'

vQ

>  0:8,
> c(4:0, rep(0,4)), 8:0)))
>     # none -> '0foo f0oo foo0 foo00 f0o0o foofoofoo0 0foofoofoo'
>     # all -> 'x x x x x x'




More information about the R-help mailing list