[R] Behaviour of 'source' with URLs and proxy

Renaud Gaujoux renaud at mancala.cbio.uct.ac.za
Wed Oct 5 14:46:13 CEST 2011


On 05/10/2011 13:45, Prof Brian Ripley wrote:
> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>
>> From the help page ?file I -- had -- read the following:
>>
>> "For ‘url’ the description is a complete URL, including scheme
>> (such as ‘http://’,ftp://’ or ‘file://’). Proxies can be
>> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."
>
> So you should have known that it was the same as url()!

I agree. I just thought -- incorrectly -- that any attempt to download a 
file from R would eventually call the same C code as download.file. Or 
maybe source() does not download and source, but reads the file on the fly?

>
>> From the internet.info messages it seems that the proxy is actually 
>> used, but somehow differently than what download.file does (via wget).
>
> No, somewhat differently than *wget* does.  As that help page says, 
> the section on proxies only refers to the internal method.
>
>> Is source supposed to work through a proxy?
>
> Yes, and it has been tested to do so.  But not tested on your proxy ....

OK, I agree that my settings look special, but in the end it is supposed 
to be a plain local proxy with no authentication.
The proxy is effectively used by the internal method and, from the 
messages (below), the remote file is opened, http headers are returned, 
but nothing else happens and I have to cancel the command (Ctrl-C).

This is where I would like to have some input, so that I can work out 
the issue.
I tried to go through the C code for internet with no great luck: seems 
that in_R_HTTPRead and RxmlNanoHTTPRead would the place to look at.

Any idea on what would cause these functions to hang (infinite loop, 
communication problem, ...)?
I know, I am too curious.

Thank you


 > Sys.getenv('http_proxy')
[1] "http://localhost:8080/"
 > Sys.getenv('no_proxy')
[1] "localhost,127.0.0.0/8,*.local"
 > options(internet.info=0)
 > download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt")
trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt'
Content type 'text/plain' length 1209 bytes
opened URL
^C
There were 15 warnings (use warnings() to see them)
 > warnings()
Warning messages:
1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
   connected to 'localhost' on port 8080.
2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
   -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0
Host: lib.stat.cmu.edu
User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)

3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- HTTP/1.1 200 OK
4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004
5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Connection: Keep-Alive
6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Proxy-Connection: Keep-Alive
7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Content-Length: 1209
8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Age: 747
9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Date: Wed, 05 Oct 2011 11:52:25 GMT
10: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Content-Type: text/plain
11: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- ETag: "5c700f3-4b9-399383c0"
12: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Server: Apache
13: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Accept-Ranges: bytes
14: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
<- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT
15: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
   Code 200, content-type 'text/plain'

>
>
>>
>> -- 
>> Renaud Gaujoux
>> Computational Biology - University of Cape Town
>> South Africa
>>
>>
>> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>
>>>> Hi,
>>>>
>>>> I am having troubles sourcing a file from our local network from R.
>>>> It looks like this file are not properly accessed by 'source', even 
>>>> they can be downloaded with download.file. (See below my settings 
>>>> and some tests I did). I ended up with a work around, but I would 
>>>> like to understand what is going on.
>>>>
>>>> Doesn't source/readLines uses the same mechanism as download.file 
>>>> to access URLs?
>>>
>>> No. They use url() connections. See ?file.
>>>
>>>>
>>>> Thank you.
>>>>
>>>> Renaud
>>>>
>>>> My setting:
>>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>>> - I am accessing internet through a proxy (set up with cntlm, not 
>>>> sure if this is the issue but I don't know how to check without 
>>>> it). This means that http_proxy='http://localhost:8080/'.
>>>> - We have local CRNA/BioConductor mirrors that can be accessed 
>>>> without going through the proxy.
>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, 
>>>> that sets all relevant repos to our local mirrors.
>>>>
>>>> From the shell:
>>>> - I can wget any URL (local or internet) from command line without 
>>>> a problem.
>>>> - In particular I can wget the file 'setrepos.R' from command line.
>>>>
>>>> Symptoms:
>>>> - with options(download.file.method='wget'), I can download any URL 
>>>> (local or internet) with download.file
>>>> - I _cannot_ source any local or internet URL if http_proxy is set. 
>>>> It simply freezes. Using internet.info=0 gives the following messages:
>>>> ############
>>>> Warning messages:
>>>> 1: In file(file, "r", encoding = encoding) :
>>>> using HTTP proxy 'http://localhost:8080/'
>>>> 2: In file(file, "r", encoding = encoding) :
>>>> connected to 'localhost' on port 8080.
>>>> 3: In file(file, "r", encoding = encoding) :
>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>>> Host: *OUR_HOST*
>>>> Pragma: no-cache
>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>>>
>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004
>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive
>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: 
>>>> Keep-Alive
>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>>> 9: In file(file, "r", encoding = encoding) :
>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: 
>>>> text/plain
>>>> 11: In file(file, "r", encoding = encoding) :
>>>> <- ETag: "30b8018-63d-4a627b821c980"
>>>> 12: In file(file, "r", encoding = encoding) :
>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 
>>>> PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1 
>>>> Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0
>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes
>>>> 14: In file(file, "r", encoding = encoding) :
>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>>> 15: In file(file, "r", encoding = encoding) : Code 200, 
>>>> content-type 'text/plain'
>>>> ############
>>>>
>>>> - Setting options(download.file.method='wget') before sourcing does 
>>>> not change the behaviour.
>>>> - However, I can source any local URL if http_proxy='', without 
>>>> changing download.file.method. But then download.file does not work 
>>>> for internet URL any more since the proxy settings are wrong. I 
>>>> could set http_proxy='', then source, then restore the proxy 
>>>> settings and set options(download.file.method='wget'). But this is 
>>>> just a work around and I would like to understand what is going on.
>>>>
>>>> Session Info:
>>>>
>>>> R version 2.13.2 (2011-09-30)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] devtools_0.4
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] RCurl_1.6-10 tools_2.13.2
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> Renaud Gaujoux
>>>> Computational Biology - University of Cape Town
>>>> South Africa
>>>>
>>>>
>>>>
>>>>
>>>> ###
>>>>
>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT 
>>>> policies and e-mai...{{dropped:5}}
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>>
>>
>> ###
>>
>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT 
>> policies and e-mail disclaimer published on our website at 
>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable 
>> from +27 21 650 9111. This e-mail is intended only for the person(s) 
>> to whom it is addressed. If the e-mail has reached you in error, 
>> please notify the author. If you are not the intended recipient of 
>> the e-mail you may not use, disclose, copy, redirect or print the 
>> content. If this e-mail is not related to the business of UCT it is 
>> sent by the sender in the sender's individual capacity.
>>
>> ###
>>
>>
>



###

UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}



More information about the R-help mailing list