[R] RCurl much faster than base R

Martin Morgan mtmorgan at fredhutch.org
Tue Dec 9 16:06:48 CET 2014


On 12/05/2014 08:12 AM, Alex Gutteridge wrote:
> I'm trying to debug a curious network issue, I wonder if anyone can help me as I
> (and my local sysadmin) am stumped:
>
> This base R command takes ~1 minute to complete:
>
> readLines(url("http://bioconductor.org/biocLite.R"))
>
> (biocLite.R is a couple of KB in size)
>
> Using RCurl (and so libcurl under the hood) is instantaneous (<1s):
>
> library(RCurl)
> getURL("http://bioconductor.org/biocLite.R")
>
> I've not set it to use any proxies (which was my first thought) unless libcurl
> autodetects them somehow... And the speed is similarly fast using wget or curl
> on the command line. It just seems to be the base R commands which are slow
> (including install.packages etc...).
>
> Does anyone have hints on how to debug this (if not an answer directly)?
>

Hi Alex -- maybe not surprisingly, both approaches are approximately equally 
speedy for me, at least on average.

For what it's worth

- there is no need to use url(), just readLines("http://...")

It would help to

- provide the output of sessionInfo()

- verify or otherwise that the problem is restricted to particular urls

- work through a simple example where the test say 'works' when accessing a 
local http server (e.g., on the same machine and in a directory "mydir", python 
-m SimpleHTTPServer 10000 in one terminal, the 
readLines("http://localhost:10000/<some file in 'mydir'>") but fails after some 
increasingly remote point, e.g., accessing a url outside your institution 
firewall hence indicating a firewall issue.

Maybe at the end of this exercise the only insight will be that the R and curl 
implementations differ (a known known!).

Also if this is really a problem with installing Bioconductor packages rather 
than a general R question, then https://support.bioconductor.org is a better 
place to post. If the problem is restricted to bioconductor.org, then: (a) for 
your sys.admin, the url is redirected (via DNS, not http:) to Amazon Cloud Front 
and from there to a regional Amazon data center; I'm not sure what the 
significance of this might be, e.g., the admin might have throttled download 
speeds from certain ip address ranges; and (b) if you're in Europe or elsewhere, 
you're trying to install Bioconductor packages, and the regional data center is 
not fast enough (it should be responsive, at least when the url has been seen 
'recently'), then configure R to use a local mirror from 
http://bioconductor.org/about/mirrors/, e.g.,

     chooseBioCmirror()

Martin Morgan
Bioconductor

> AlexG
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list