[R] Parsing of HTML files in R

Douglas Bates bates at stat.wisc.edu
Thu Oct 25 16:38:55 CEST 2001


Duncan Temple Lang <duncan at research.bell-labs.com> writes:

> If my memory serves me correctly, I believe that Dan Veillard's libxml
> library provides an adaptation of the XML parser that handles HTML. In
> that case, I can add something to the XML package that allows us to
> access the HTML parser and use the same interface for both XML and
> HTML from within R. I'll take a look and see if this is relatively
> easy to do.

Alternatively, try to transform your HTML to XHTML which can be parsed
as XML.  See the documentation on the "tidy" utility at

                http://www.w3.org/People/Raggett/tidy/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list