[R] htmlParse (from XML library) working sporadically in the same code

Duncan Temple Lang dtemplelang at ucdavis.edu
Wed Mar 20 19:18:06 CET 2013


When readHTMLTable() or more generally the HTML/XML parser fails to retrieve
a URL, I suggest you use check to see if a different approach will work.
You can use the download.file() function or readLines(url()) or
getURLContent() from the RCurl package to get the content of the URL.

The you can pass that content to readHTMLTable() via
  readHTMLTable(htmlParse(text, asText = TRUE))
or
  readHTMLTable(text,  asText = TRUE)

 D.

On 3/20/13 10:07 AM, Andre Zege wrote:
> I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is 
> 
> http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0
> 
> 
> Sometimes the following code works
> n<-readHTMLTable(htmlParse(url))
> 
> 
> But most of the time it would return the following error coming from htmlParse:
> 
> Error: failed to load HTTP resource
> 
> 
> Error is coming from the following line in htmlParse code:
>  
>   ans <- .Call("RS_XML_ParseTree", as.character(file), handlers, as.logical(ignoreBlanks), as.logical(replaceEntities), as.logical(asText), as.logical(trim), as.logical(validate), as.logical(getDTD), as.logical(isURL), as.logical(addAttributeNamespaces), as.logical(useInternalNodes), as.logical(isHTML), as.logical(isSchema), as.logical(fullNamespaceInfo), as.character(encoding), as.logical(useDotNames), xinclude, error, addFinalizer, as.integer(options), PACKAGE = "XML")
> 
> 
> 
> By the way, readHTMLTable(htmlParse(url)) works fine on other pages, so the problem is somehow related to this page. 
> 
> I am using 64-bit  R.15.3 version on windows machine
> 
> Thanks much
> Andre
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list