[R] Rapache ( was Developing a web crawler )

Mike Marchywka marchywka at hotmail.com
Sun Mar 6 14:06:48 CET 2011







----------------------------------------
> Date: Thu, 3 Mar 2011 13:04:11 -0600
> From: Matt.Shotwell at vanderbilt.edu
> To: r-help at r-project.org
> Subject: Re: [R] Developing a web crawler / R "webkit" or something similar? [off topic]
>
> On 03/03/2011 08:07 AM, Mike Marchywka wrote:
> >
> >
> >
> >
> >
> >
> >
> >> Date: Thu, 3 Mar 2011 01:22:44 -0800
> >> From: antujsrv at gmail.com
> >> To: r-help at r-project.org
> >> Subject: [R] Developing a web crawler
> >>
> >> Hi,
> >>
> >> I wish to develop a web crawler in R. I have been using the functionalities
> >> available under the RCurl package.
> >> I am able to extract the html content of the site but i don't know how to go
> >
> > In general this can be a big effort but there may be things in
> > text processing packages you could adapt to execute html and javascript.
> > However, I guess what I'd be looking for is something like a "webkit"
> > package or other open source browser with or without an "R" interface.
> > This actually may be an ideal solution for a lot of things as you get
> > all the content handlers of at least some browser.
> >
> >
> > Now that you mention it, I wonder if there are browser plugins to handle
> > "R" content ( I'd have to give this some thought, put a script up as
> > a web page with mime type "test/R" and have it execute it in R. )
>
> There are server-side solutions for this sort of thing. See
> http://rapache.net/ . Also, there was a string of messages on R-devel
> some years ago addressing the mime type issue; beginning here:
> http://tolstoy.newcastle.edu.au/R/devel/05/11/3054.html . Though I don't
> know whether there was a resolution. Some suggestions were text/x-R,
> text/x-Rd, application/x-RData.
>
The rapache demo looks like something I could use right away
but I haven't looked into the handlers yet. I have installed rapache now
on my debian system ( still have config issues but I did get apach2 to restart LOL)
Before I plow into this too far, how would this compare/compete with something
like a PHP library for Rserve? That is the approach I had been pursuing.

Thanks. 



> -Matt
>
> >

 		 	   		  


More information about the R-help mailing list