[R] reading tables from multiple HTML pages

Dennis Murphy djmuser at gmail.com
Mon Aug 29 20:39:17 CEST 2011


?tryCatch

HTH,
Dennis

On Mon, Aug 29, 2011 at 9:04 AM, s1oliver <s1oliver at ucsd.edu> wrote:
> Hi, beginner to R and was having some problems scraping data from tables in
> html using the XML package. I have included some code below.
>
> I am trying to loop through a series of html pages, each of which contains a
> single table from which I want to scrape data. However, some of the pages
> are blank - and so it throws me an error message when it gets to
> htmlParse(). The loop then closes out and I get the error message below:
>
> Error in htmlParse(url) :
>  error in creating parser for
> http://www.szrd.gov.cn/viewcommondbfc.do?id=728
>
> How might be best to go about keeping the loop running so I can parse the
> rest?
>
> ****************************************************
>
> library(XML)
>
> url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id="
>
> for(i in 700:750){
>        url = paste(url_root, i, sep="")
>        doc = htmlParse(url)
>
>        tableNodes = getNodeSet(doc, "//table")
>        tbl = readHTMLTable(tableNodes[[3]])
> }
> ****************************************************
>
> Steve Oliver
> Department of Political Science
> University of California at San Diego
> 9500 Gilman Dr.
> La Jolla, CA 92092
>
> --
> View this message in context: http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list