[R] Regex question to find a string that contains 5-9 alpha-numeric characters, at least one of which is a number

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 9 09:46:15 CEST 2009


Marc Schwartz wrote:
>
> On Jun 8, 2009, at 5:27 PM, Barry Rowlingson wrote:
>
>> On Mon, Jun 8, 2009 at 10:40 PM, Tan, Richard<RTan at panagora.com> wrote:
>>> Hi,
>>>
>>> This is not exactly an R question but I am trying to use gsub to
>>> replace
>>> a string that contains 5-9 alpha-numeric characters, at least one of
>>> which is a number.  Is there a good way to write it in a one line
>>> regex?
>>
>> The only way I can think of is to spell out all the possible
>> expressions, somethinglike:
>>
>> [0-9][a-z0-9]{4} | [a-z0-9][0-9][a-z0-9]{3} |
>> [a-z0-9]{2}[0-9][a-z0-9]{2} .... and so on. That is, have a regex
>> component for every possible 5, 6, 7, 8, and 9 character expression
>> with [0-9] in each place. I'm not sure this qualifies as 'good',
>> though..

something like this?

    input = c(
        none='0foo f0oo foo0 foo00 f0o0o foofoofoo0 0foofoofoo',
        all='foob0 foo0b 0foob 0foobardo foob4rdoo foobardo0')

    gsub(x=input, replacement='x', perl=TRUE,
        pattern=paste(collapse='|',
           
sprintf('\\b[[:alpha:]-]{%d}[[:digit:]][[:alpha:]]{%d,%d}\\b', 0:8,
c(4:0, rep(0,4)), 8:0)))
    # none -> '0foo f0oo foo0 foo00 f0o0o foofoofoo0 0foofoofoo'
    # all -> 'x x x x x x'

vQ




More information about the R-help mailing list