[R] how to read a web page and extract an html table?

Pikounis, Bill v_bill_pikounis at merck.com
Tue May 6 17:24:48 CEST 2003


Adrian,

> I want to extract the table from the html file. 
> Is there a function html2R, the opposite of R2html? 
> How should I do this? 

Parsing arbitrary HTML is generally a nontrivial task.  I would recommend
using something like Perl to convert the HTML to delimited ASCII, and then
use read.table() for example. There are specific modules in Perl (for
example) that can help with the "HTML-2-ASCII" step, if not do it entirely.
I have never used one myself, but I am sure CPAN can be searched for one.

Hope that helps,
Bill


----------------------------------------
Bill Pikounis, Ph.D.
Biometrics Research Department
Merck Research Laboratories
PO Box 2000, MailDrop RY84-16  
126 E. Lincoln Avenue
Rahway, New Jersey 07065-0900
USA

v_bill_pikounis at merck.com

Phone: 732 594 3913
Fax: 732 594 1565


> -----Original Message-----
> From: Adi Humbert [mailto:adrian_humbert at yahoo.com]
> Sent: Tuesday, May 06, 2003 10:31 AM
> To: r-help at stat.math.ethz.ch
> Cc: adrian_humbert at yahoo.com
> Subject: [R] how to read a web page and extract an html table?
> 
> 
> Hello all, 
> 
> I want to read a table from a given web page. 
> 
> If I do something like
> > str="http://www...."      # this is the web address
> > aux1 <- url(str,open="rt")# open connection 
> > aux2 <- readLines(aux1)   # read web page 
> aux2 contains the html file. 
> 
> I want to extract the table from the html file. 
> Is there a function html2R, the opposite of R2html? 
> How should I do this? 
> 
> Thanks, 
> Adrian
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>




More information about the R-help mailing list