[R] Extracting text from html code using the RCurl package.
tony.breyal at googlemail.com
Mon Oct 6 17:45:55 CEST 2008
I want to download the text from a web page, however what i end up
with is the html code. Is there some option that i am missing in the
RCurl package? Or is there another way to achieve this? This is the
code i am using:
> my.url <- 'https://stat.ethz.ch/mailman/listinfo/r-help'
> html.file <- getURI(my.url, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, followlocation = TRUE)
I thought perhaps the htmlTreeParse() function from the XML package
might help, but I just don't know what to do next with it:
Many thanks for any help you can provide,
R version 2.7.2 (2008-08-25)
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
attached base packages:
 stats graphics grDevices utils datasets methods
other attached packages:
 XML_1.94-0 RCurl_0.9-4
More information about the R-help