[R] Behaviour of 'source' with URLs and proxy

Renaud Gaujoux renaud at mancala.cbio.uct.ac.za
Wed Oct 5 15:07:24 CEST 2011


So source() always reads a URL using the internal method, because it 
reads them chunk by chunk, and I suppose the other methods of 
download.file (wget, etc...) do not support (?).

I guess the only way of finding out where the reading process gets stuck 
is to get into the C code and add more tracking messages. Will try this.

Thank you.

On 05/10/2011 14:49, Prof Brian Ripley wrote:
> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>
>>
>> On 05/10/2011 13:45, Prof Brian Ripley wrote:
>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>
>>>> From the help page ?file I -- had -- read the following:
>>>>
>>>> "For ‘url’ the description is a complete URL, including scheme
>>>> (such as ‘http://’,ftp://’ or ‘file://’). Proxies can be
>>>> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."
>>>
>>> So you should have known that it was the same as url()!
>>
>> I agree. I just thought -- incorrectly -- that any attempt to 
>> download a file from R would eventually call the same C code as 
>> download.file. Or maybe
>
> It does.  But download.file(method="wget") does not call that C code ....
>
>> source() does not download and source, but reads the file on the fly?
>
> That is true too, but then downloading a file is always done in chunks.
>
>>>
>>>> From the internet.info messages it seems that the proxy is actually 
>>>> used, but somehow differently than what download.file does (via wget).
>>>
>>> No, somewhat differently than *wget* does.  As that help page says, 
>>> the section on proxies only refers to the internal method.
>>>
>>>> Is source supposed to work through a proxy?
>>>
>>> Yes, and it has been tested to do so.  But not tested on your proxy 
>>> ....
>>
>> OK, I agree that my settings look special, but in the end it is 
>> supposed to be a plain local proxy with no authentication.
>> The proxy is effectively used by the internal method and, from the 
>> messages (below), the remote file is opened, http headers are 
>> returned, but nothing else happens and I have to cancel the command 
>> (Ctrl-C).
>>
>> This is where I would like to have some input, so that I can work out 
>> the issue.
>> I tried to go through the C code for internet with no great luck: 
>> seems that in_R_HTTPRead and RxmlNanoHTTPRead would the place to look 
>> at.
>>
>> Any idea on what would cause these functions to hang (infinite loop, 
>> communication problem, ...)?
>> I know, I am too curious.
>>
>> Thank you
>>
>>
>>> Sys.getenv('http_proxy')
>> [1] "http://localhost:8080/"
>>> Sys.getenv('no_proxy')
>> [1] "localhost,127.0.0.0/8,*.local"
>>> options(internet.info=0)
>>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt")
>> trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt'
>> Content type 'text/plain' length 1209 bytes
>> opened URL
>> ^C
>> There were 15 warnings (use warnings() to see them)
>>> warnings()
>> Warning messages:
>> 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>>  connected to 'localhost' on port 8080.
>> 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>>  -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0
>> Host: lib.stat.cmu.edu
>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>
>> 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- HTTP/1.1 200 OK
>> 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004
>> 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Connection: Keep-Alive
>> 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Proxy-Connection: Keep-Alive
>> 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Content-Length: 1209
>> 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Age: 747
>> 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  
>> ... :
>> <- Date: Wed, 05 Oct 2011 11:52:25 GMT
>> 10: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>> <- Content-Type: text/plain
>> 11: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>> <- ETag: "5c700f3-4b9-399383c0"
>> 12: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>> <- Server: Apache
>> 13: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>> <- Accept-Ranges: bytes
>> 14: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>> <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT
>> 15: In 
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>>  Code 200, content-type 'text/plain'
>>
>>>
>>>
>>>>
>>>> -- 
>>>> Renaud Gaujoux
>>>> Computational Biology - University of Cape Town
>>>> South Africa
>>>>
>>>>
>>>> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>>>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am having troubles sourcing a file from our local network from R.
>>>>>> It looks like this file are not properly accessed by 'source', 
>>>>>> even they can be downloaded with download.file. (See below my 
>>>>>> settings and some tests I did). I ended up with a work around, 
>>>>>> but I would like to understand what is going on.
>>>>>>
>>>>>> Doesn't source/readLines uses the same mechanism as download.file 
>>>>>> to access URLs?
>>>>>
>>>>> No. They use url() connections. See ?file.
>>>>>
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Renaud
>>>>>>
>>>>>> My setting:
>>>>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>>>>> - I am accessing internet through a proxy (set up with cntlm, not 
>>>>>> sure if this is the issue but I don't know how to check without 
>>>>>> it). This means that http_proxy='http://localhost:8080/'.
>>>>>> - We have local CRNA/BioConductor mirrors that can be accessed 
>>>>>> without going through the proxy.
>>>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, 
>>>>>> that sets all relevant repos to our local mirrors.
>>>>>>
>>>>>> From the shell:
>>>>>> - I can wget any URL (local or internet) from command line 
>>>>>> without a problem.
>>>>>> - In particular I can wget the file 'setrepos.R' from command line.
>>>>>>
>>>>>> Symptoms:
>>>>>> - with options(download.file.method='wget'), I can download any 
>>>>>> URL (local or internet) with download.file
>>>>>> - I _cannot_ source any local or internet URL if http_proxy is 
>>>>>> set. It simply freezes. Using internet.info=0 gives the following 
>>>>>> messages:
>>>>>> ############
>>>>>> Warning messages:
>>>>>> 1: In file(file, "r", encoding = encoding) :
>>>>>> using HTTP proxy 'http://localhost:8080/'
>>>>>> 2: In file(file, "r", encoding = encoding) :
>>>>>> connected to 'localhost' on port 8080.
>>>>>> 3: In file(file, "r", encoding = encoding) :
>>>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>>>>> Host: *OUR_HOST*
>>>>>> Pragma: no-cache
>>>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>>>>>
>>>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 
>>>>>> SRVWINTMG004
>>>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: 
>>>>>> Keep-Alive
>>>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: 
>>>>>> Keep-Alive
>>>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>>>>> 9: In file(file, "r", encoding = encoding) :
>>>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: 
>>>>>> text/plain
>>>>>> 11: In file(file, "r", encoding = encoding) :
>>>>>> <- ETag: "30b8018-63d-4a627b821c980"
>>>>>> 12: In file(file, "r", encoding = encoding) :
>>>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 
>>>>>> PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1 
>>>>>> Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 
>>>>>> Perl/v5.10.0
>>>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: 
>>>>>> bytes
>>>>>> 14: In file(file, "r", encoding = encoding) :
>>>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>>>>> 15: In file(file, "r", encoding = encoding) : Code 200, 
>>>>>> content-type 'text/plain'
>>>>>> ############
>>>>>>
>>>>>> - Setting options(download.file.method='wget') before sourcing 
>>>>>> does not change the behaviour.
>>>>>> - However, I can source any local URL if http_proxy='', without 
>>>>>> changing download.file.method. But then download.file does not 
>>>>>> work for internet URL any more since the proxy settings are 
>>>>>> wrong. I could set http_proxy='', then source, then restore the 
>>>>>> proxy settings and set options(download.file.method='wget'). But 
>>>>>> this is just a work around and I would like to understand what is 
>>>>>> going on.
>>>>>>
>>>>>> Session Info:
>>>>>>
>>>>>> R version 2.13.2 (2011-09-30)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] devtools_0.4
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] RCurl_1.6-10 tools_2.13.2
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> Renaud Gaujoux
>>>>>> Computational Biology - University of Cape Town
>>>>>> South Africa
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ###
>>>>>>
>>>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT 
>>>>>> policies and e-mai...{{dropped:5}}
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide 
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ###
>>>>
>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT 
>>>> policies and e-mail disclaimer published on our website at 
>>>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable 
>>>> from +27 21 650 9111. This e-mail is intended only for the 
>>>> person(s) to whom it is addressed. If the e-mail has reached you in 
>>>> error, please notify the author. If you are not the intended 
>>>> recipient of the e-mail you may not use, disclose, copy, redirect 
>>>> or print the content. If this e-mail is not related to the business 
>>>> of UCT it is sent by the sender in the sender's individual capacity.
>>>>
>>>> ###
>>>>
>>>>
>>>
>>
>>
>>
>> ###
>>
>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT 
>> policies and e-mail disclaimer published on our website at 
>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable 
>> from +27 21 650 9111. This e-mail is intended only for the person(s) 
>> to whom it is addressed. If the e-mail has reached you in error, 
>> please notify the author. If you are not the intended recipient of 
>> the e-mail you may not use, disclose, copy, redirect or print the 
>> content. If this e-mail is not related to the business of UCT it is 
>> sent by the sender in the sender's individual capacity.
>>
>> ###
>>
>>
>



###

UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}



More information about the R-help mailing list