[R] Need some help with regular expression

Steven Nagy nstefi at gmail.com
Sun Nov 20 05:06:36 CET 2016


I tried out a regular expression on this website:

http://regexr.com/3en1m

 

So the input text is:

"Name.MEMBER_TYPE:  -> STU"

 

The regular expression is: ((?:\w+|\s) -> STU|STU -> (?:\w+|\s))

And it returns:

"  -> STU"

 

but when I use in R, it doesn't return the same result:

strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = -1,
perl = TRUE)

returns:
"Name.MEMBER_TYPE: -> STU"

 

 

Here is what I was trying to do:

 

I need to extract some values from a log table, and I created a regular
expression that helps me with that.

The log table has cells with values like:

a = "Name.MEMBER_TYPE: NMA -> STU ; CATEGORY:  -> 1 ; CITY: MISSISSAUGA ->
Mississauga ; ZIP: L5N1H9 -> L5N 1H9 ; COUNTRY: CAN ->  ; MEMBER_STATUS:  ->
N"

or
b = "Name.MEMBER_TYPE: STU -> REG ; CATEGORY: 1 ->" 

so I needed to extract the values that a STU member type is changing from
and to, so I needed NMA, STU in the 1st case or STU, REG in the 2nd case.

I came up with this expression which worked in both cases:

strapply(strapply(a, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl =
TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)

 

But I had a 3rd case when the source member type was blank:

c = "Name.MEMBER_TYPE: -> STU"

and in that case it returned an error:

strapply(strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl =
TRUE), "(\\w+) -> (\\w+)", c, backref = -2, perl = TRUE)

Error: is.character(x) is not TRUE

 

I found that the error is because this returns NULL:

strapply(c, "(\\w+ -> STU|STU -> \\w+)", c, backref = -1, perl = TRUE)

 

 

So I tried to modify the regular expression to match any word or blank
space:

strapply(c, "((?:\\w+|\\s) -> STU|STU -> (?:\\w+|\\s))", c, backref = -1,
perl = TRUE)

 

but this returned me the whole value of "c":

"Name.MEMBER_TYPE:  -> STU"

and I only needed "  -> STU" as it shows on the website regxr.com

 

Is the result wrong on the regxr.com website or strapply returns the wrong
result?

 

Thanks,

Steven


	[[alternative HTML version deleted]]



More information about the R-help mailing list