[R] readHTMLTable (XML package)
lopez235 at llnl.gov
Wed Jan 16 00:23:53 CET 2013
Thank you. That more or less did the trick. I got the data though it's in a weird format compared to how it appears on the page and needs a lot of clean up. But I was kind of expecting that.
From: Ista Zahn [mailto:istazahn at gmail.com]
Sent: Tuesday, January 15, 2013 3:18 PM
To: Lopez, Dan
Cc: R help (r-help at r-project.org)
Subject: Re: [R] readHTMLTable (XML package)
On Tue, Jan 15, 2013 at 5:31 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
> Hi Ista,
> It does exist. It’s a page in our company intranet.
> It is https so it looks like I can't use RCurl either. I tried RCurl BTW and got the below error.
Well that error is not because RCurl doesn't work with https protocol.
In my original example I meant to show
tabs <- readHTMLTable(getURL("https://en.wikipedia.org/wiki/List_of_countries_by_population"))
i.e., getURL() does work with https. (Well, maybe depending on your version of libcurl. See the getURL help page for details.)
> Do you have experience with pulling a table of an https site?
Yes, I do :)
> If so how do I do that?
>> tabs <-
> Error in readHTMLTable(getURL("https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html")) :
> error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in function (type, msg, asError = TRUE) :
> SSL certificate problem, verify that the CA cert is OK. Details:
> error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate
> verify failed
This is an RCurl FAQ (see http://www.omegahat.org/RCurl/FAQ.html). The quick and dirty way is
ssl.verifypeer = FALSE)
> -----Original Message-----
> From: Ista Zahn [mailto:istazahn at gmail.com]
> Sent: Tuesday, January 15, 2013 12:22 PM
> To: Lopez, Dan
> Cc: R help (r-help at r-project.org)
> Subject: Re: [R] readHTMLTable (XML package)
> Hi Dan,
> A couple of things: first, I think that file really does not exist (at
> least I can't open it in my web browser). Second, even if it did,
> url() cannot download from https, according to the details section of
> ?url, which points you to RCurl. So, once you verify that you url
> actually exists you can do something like
> tabs <-
> On Tue, Jan 15, 2013 at 2:59 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
>> I am using XML::readHTMLTable and getting the below error. Does anyone know why? Does this function not work with https? I didn't see anything in help about that.
>> Error in htmlParse(doc) :
>> File https://hr-workforce-analytics.llnl.gov/wf_pi_pop.html does
>> not exist
>> [[alternative HTML version deleted]]
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help