[R] Regular expression help

Marc Schwartz marc_schwartz at me.com
Tue Dec 8 00:15:35 CET 2009


On Dec 7, 2009, at 5:04 PM, Ramya wrote:

>
> Hi  there
>
> I have a string like this i want to extract 9831019 from this string  
> i used
> a regular expresion \d+ by which i can only make it to see 7 and  
> returns.
> This type of number(9831019)  appears in any part of the string and is
> definitely more than 5 digits all the time and i want to give that  
> as a
> condition
>
> UV7C11-F9-E1 MCS#9831019
> MCS Lot #9512516"
>
>
> how do i go abt it
>
> Ramya


Is the double quote actually part of your data or just a typo?

I am not sure that it might matter in the end, but here is one approach:

 > x
[1] "UV7C11-F9-E1 MCS#9831019" "MCS Lot #9512516\""

Note that I have the double quote included in the second value, which  
is escaped when printed here.

 > gsub("^.*#([0-9]*).*$", "\\1", x)
[1] "9831019" "9512516"


This uses gsub() to extract the value within the parens in the regex  
using a back reference.

Any characters from the beginning of the line to the '#' are dropped,  
as are any characters after the numeric sequence to the end of the line.

See ?gsub for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list