[R] How to download this data?

Ron Michael ron_michael70 at yahoo.com
Sat Aug 3 10:05:33 CEST 2013


In the mean time I have this problem sorted out, hopefully I did it correctly. I have modified the line of your code as:
 
rawOrig = getURLContent("https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry", ssl.verifypeer = FALSE)
 
However next I faced with another problem to executing:
 > u = sprintf("<a href="https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219">https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219", jsession) 
Error: unexpected symbol in "u = sprintf("<a href="https"

Can you or someone else help me to get out of this error?
 
Also, my another question is: from where you got the expression:
"<a href="https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219">https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219"
 
I really appreciate if someone help me to understand that.
 
Thank you.


----- Original Message -----
From: Ron Michael <ron_michael70 at yahoo.com>
To: Duncan Temple Lang <dtemplelang at ucdavis.edu>; "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Saturday, 3 August 2013 12:58 PM
Subject: Re: [R] How to download this data?

Hello Duncan,
 
Thank you very much for your pointer.
 
However when I tried to run your code, I got following error:
 > rawOrig = getURLContent("https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry") 
Error in function (type, msg, asError = TRUE)  : 
  SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Can someone help me to understand what could be the cause of this error?
 
Thank you.


----- Original Message -----
From: Duncan Temple Lang <dtemplelang at ucdavis.edu>
To: r-help at r-project.org
Cc: 
Sent: Saturday, 3 August 2013 4:33 AM
Subject: Re: [R] How to download this data?


That URL is an HTTPS (secure HTTP), not an HTTP.
The XML parser cannot retrieve the file.
Instead, use the RCurl package to get the file.

However, it is more complicated than that. If
you look at source of the HTML page in a browser,
you'll see a jsessionid and that is a session identifier.

The following retrieves the content of your URL and then
parses it and extracts the value of the jsessionid.
Then we create the full URL to the actual data page (which is actually in the HTML
content but in JavaScript code)

library(RCurl)
library(XML)

rawOrig = getURLContent("https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry")
rawDoc = htmlParse(rawOrig)
tmp = getNodeSet(rawDoc, "//@href[contains(.,\040'jsessionid=')]")[[1]]
jsession = gsub(".*jsessionid=([^?]+)?.*", "\\1", tmp)

u = sprintf("https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=&specId=219", jsession)

doc = htmlParse(getURLContent(u))
tbls = readHTMLTable(doc)
data = tbls[[1]]

dim(data)


I did this quickly so it may not be the best way or completely robust, but hopefully
it gets the point across and does get the data.

  D.

On 8/2/13 2:42 PM, Ron Michael wrote:
> Hi all,
>  
> I need to download the data from this web page:
>  
> https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry
>  
> I used the function readHTMLTable() from package XML, however could not download that.
>  
> Can somebody help me how to get the data onto my R window?
>  
> Thank you.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list