[R] Accessing data via url

Mike Marchywka marchywka at hotmail.com
Fri Jan 7 12:46:40 CET 2011





> Date: Fri, 7 Jan 2011 00:24:19 -0800
> From: dieter.menne at menne-biomed.de
> To: r-help at r-project.org
> Subject: Re: [R] Accessing data via url
>
>
>
> John Kane-2 wrote:
> >
> > # Can anyone suggest why this works
> >
> > datafilename <-
> > "http://personality-project.org/r/datasets/maps.mixx.epi.bfi.data"
> > person.data <- read.table(datafilename,header=TRUE)
> >
> > # but this does not?
> >
> > dd <-
> > "https://sites.google.com/site/jrkrideau/home/general-stores/trees.txt"
> > treedata <- read.table(dd, header=TRUE)
> >
> > ===================================================================
> >
> > Error in file(file, "rt") : cannot open the connection
> >
>
> Your original file is no longer there, but when I try RCurl with a png file
> that is present, I get a certificate error:
>
> Dieter
>
> --------
> library(RCurl)
> sessionInfo()
> dd <-
> "https://sites.google.com/site/jrkrideau/home/general-stores/history.png"
> x = getBinaryURL(dd)
>
> -------------
> > sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] RCurl_1.5-0.1 bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.1
>
> > dd <-
> > "https://sites.google.com/site/jrkrideau/home/general-stores/history.png"
>
> > x = getBinaryURL(dd)
> Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
> SSL certificate problem, verify that the CA cert is OK. Details:
> error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify
> failed
>
>
I think I replied to OP only using wget but puresumably there is similar option
for rcurl as "-k" on cmd line version. Network IO is unpredictable, you really
can use a few external tools from time to time. 

$ wget -O xxx -S -v --no-check-certificate --user-agent="Mozilla5.0" "http://si
tes.google.com/site/jrkrideau/home/general-stores/trees.txt"
--2011-01-06 16:00:01--  http://sites.google.com/site/jrkrideau/home/general-sto
res/trees.txt
Resolving sites.google.com (sites.google.com)... 74.125.229.3, 74.125.229.5, 74.
125.229.13, ...
Connecting to sites.google.com (sites.google.com)|74.125.229.3|:80... connected.

HTTP request sent, awaiting response...
  HTTP/1.0 404 Not Found
  Content-Type: text/html; charset=utf-8
  Date: Thu, 06 Jan 2011 22:00:05 GMT
  Expires: Thu, 06 Jan 2011 22:00:05 GMT
  Cache-Control: private, max-age=0
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
2011-01-06 16:00:01 ERROR 404: Not Found.


$ wget -O xxx -S -v --no-check-certificate --user-agent="Mozilla5.0" "http://si
tes.google.com/site/jrkrideau/home/general-stores/history.png"
--2011-01-07 05:43:00--  http://sites.google.com/site/jrkrideau/home/general-sto
res/history.png
Resolving sites.google.com (sites.google.com)... 74.125.229.11, 74.125.229.6, 74
.125.229.14, ...
Connecting to sites.google.com (sites.google.com)|74.125.229.11|:80... connected
.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Content-Type: image/png
  X-Robots-Tag: noarchive
  Cache-Control: no-cache, no-store, max-age=0, must-revalidate
  Pragma: no-cache
  Expires: Fri, 01 Jan 1990 00:00:00 GMT
  Date: Fri, 07 Jan 2011 11:43:04 GMT
  Last-Modified: Wed, 28 Oct 2009 18:58:56 GMT
  ETag: "1256756336889"
  Content-Length: 3817
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
  Connection: Keep-Alive
Length: 3817 (3.7K) [image/png]
Saving to: `xxx'

100%[======================================>] 3,817       --.-K/s   in 0s

2011-01-07 05:43:00 (30.8 MB/s) - `xxx' saved [3817/3817]


$

$ curl -o xxx -k "http://sites.google.com/site/jrkrideau/home/general-stores/hi
story.png"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3817  100  3817    0     0  28916      0 --:--:-- --:--:-- --:--:-- 40606

$

 		 	   		  


More information about the R-help mailing list