[R] Good Package(s) for String and URL processing?

Erik Iverson eriki at ccbr.umn.edu
Fri Jul 2 05:51:45 CEST 2010


Ralf B wrote:
> Are there packages that allow improved String and URL processing?
> E.g. extract parts of a URLs such as sub-domains, top-level domain,
> protocols (e.g. https, http, ftp), file type based on endings, check
> if a URL is valid or not, etc...
> 
> I am currently only using split and paste. Are there better and more
> efficient ways to handle strings e.g. finding sub-strings or to do
> pattern matching?
> What packages do you use if you have to do a lot of String processing
> and you don't have the option to go to another language such as Perl
> or Python?


Well, much of the power of Perl is built on top of regular expressions, which R 
also supports.

See ?regex for more details.  Also the R functions ?grep, ?sub, etc.

I can also highly recommend the book "Mastering Regular Expressions".  It does 
not cover R explicitly, but what you learn in there can be directly applied to 
R.  Regexs go very, very far with helping you with the task of finding 
substrings and pattern matching.

You might find some things in RCurl helpful:

http://www.omegahat.org/RCurl/

Probably others...



More information about the R-help mailing list