[R] gsub: replacing a.*a if no occurence of b in .*

Charilaos Skiadas skiadas at hanover.edu
Sat Feb 24 17:24:34 CET 2007


All these methods do assume that you don't have nested <tag>'s, like so:

<tag><tag>foo</tag>useful stuff</tag>some garbage</tag>

For that you would really need a true parser. So I would double-check  
to make sure this doesn't happen.

Do you have any control on where those XML files are generated  
though? It sounds to me it might be easier to fix the utility  
generating those XML files, since it clearly is doing something wrong.

On Feb 24, 2007, at 11:07 AM, Gabor Grothendieck wrote:

> I assume <tag> is known.
>
> This removes any occurrence </tag>.*</tag> where .* does not
> contain <tag> or </tag>.
>
> The regular expression, re, matches </tag>, then does a greedy
> match (?U) for anything followed by </tag> but uses a zero
> width lookahead subexpression (?=...) for the second </tag>
> so that it it can be rematched again.  gsubfn in package
> gsubfn is like the usual gsub except that instead of
> replacing the match with a string it passes the match
> to function f and then replaces the match with the output
> of f.  See the gsubfn home page:
>   http://code.google.com/p/gsubfn/
> and vignette.

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College



More information about the R-help mailing list