[R] Pattern Matching Replacement

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jun 19 21:04:47 CEST 2008

On Thu, Jun 19, 2008 at 2:17 PM, ppatel3026
<pratik.patel at us.rothschild.com> wrote:
> I would like to replace "\r\n" with "" in a character string, where "\r\n"
> exists only between < and >, how could I do that?
> Initial:
> characterString = "<XML><tag1
> id=\"F\r\n2\"></t\r\nag1>\r\n<tag\r\n2></tag2></XML>"
> Result:
> characterString = "<XML><tag1 id=\"F2\"></tag1>\r\n<tag2></tag2></XML>"
> Tried with sub(below) but it only replaces the first instance and I am not
> sure how to pattern match so that it only replaces \r\n that exist within
> tags(< and >).
> sub("\r\n", "", charStream)

I assume you want to delete all \r and all \n in tags and not
just \r\n but if its just \r\n then just modify the 2nd regular expression
appropriately and the rest should work the same.

gsubfn from the package of the same name
is like gsub except instead of replacing each occurrence of
the regular expression with a fixed string it feeds each match
into the function specified as arg2 and replaces the match
with the output of that function.  The function can alternately
be specified as a formula, as it is here, in which case the
right side of the formula specifies the function body and the
formal arguments of the function are constructed from the
free variables, in this case just x.  See gsubfn home page at
http://gsubfn.googlecode.com .

characterString <-
"<XML><tag1 id=\"F\r\n2\"></t\r\nag1>\r\n<tag\r\n2></tag2></XML>"

gsubfn("<[^>]*>", ~ gsub("[\r\n]", "", x), characterString)

More information about the R-help mailing list