[Rd] extracting tables from web pages?

Spencer Graves spencer.graves at structuremonitoring.com
Fri Apr 26 20:42:41 CEST 2013


On 4/25/2013 1:19 PM, Dirk Eddelbuettel wrote:
> On 25 April 2013 at 13:00, Spencer Graves wrote:
> | Hello:
> |
> |
> |        What tools would you recommend for extracting the table of
> | members of the US House of representatives from
> | "http://house.gov/representatives/" and
> | "http://en.wikipedia.org/wiki/List_of_current_members_of_the_United_States_House_of_Representatives_by_age"?
> |
> |
> |
> |        I started writing something using getURL{RCurl}.  However, I'm
> | getting bogged down manually selecting character sequences to search for
> | and split on.
>
> You could try your own sos package to search what others have done here; the
> XML package is popular for it but the whole scheme is fraught with little
> pitfalls as html very definitely is not a good format for data-delivery, and
> an html page clearly is no API for data access.


       Thanks to Gabriel Becker and Dirk Eddelbuettel for suggesting 
XML:  Its "readHTMLTable" solves my problem.


       I confess that I tried "sos" before posting to this list without 
getting useful results:  The search terms I tried returned too many 
matches to be useful.


       And Gabriel was correct in that I should have sent the question 
to R-Help, but I only concluded that after sending it here.


       Thanks again.
       Spencer
>
> Dirk



More information about the R-devel mailing list