[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

Wed Jun 22 04:45:11 CEST 2016

On 06/21/2016 09:35 PM, Winston Chang wrote:
> In R 3.2.4, if you ran download.file(method="libcurl"), it issues a
> HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
> HEAD request first, and then a GET requet. This can result in problems
> when the web server gives an error for a HEAD request, even if the
> file is available with a GET request.
>
> Is it possible to tell download.file to simply send a GET request,
> without first sending a HEAD request?
>
>
> In theory, web servers should give the same response for HEAD and GET
> requests, except that for a HEAD request, it sends only headers, and
> not the content. However, not all web servers do this for all files.
> I've seen this problem come up in two different places.
>
> The first is from an issue that someone filed for the downloader
> package. The following works in R 3.2.4, but in R 3.3.0, it fails with
> a 404 (tested on a Mac):
>    options(internet.info=1) # Show verbose download info
>    url <- "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
>   download.file(url, destfile = "out.zip", method="libcurl")
>
> In R 3.3.0, the download succeeds with method="wget", and
> method="curl". It's only method="libcurl" that has problems.
>
>
> The second place I've encountered a problem is in downloading attached
> files from a GitHub release.
>    options(internet.info=1) # Show verbose download info
>    url <- "https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
>    download.file(url, destfile = "out.zip")
>
> This one fails with a 403 Forbidden because it gets redirected to a
> URL in Amazon S3, where a signature of the file is embedded in the
> URL. However, the signature is computed with the request type (HEAD
> vs. GET), and so the same URL doesn't work for both. (See
> http://stackoverflow.com/a/20580036/412655)
>
> Any help would be appreciated!

I think I introduced this, in

------------------------------------------------------------------------
r69280 | morgan | 2015-09-03 06:24:49 -0400 (Thu, 03 Sep 2015) | 4 lines

don't create empty file on 404 and similar errors

- download.file(method="libcurl")

------------------------------------------------------------------------

The idea was to test that the file can be downloaded before trying to 
download it; previously R would download the error page as though it 
were the content.

I'll give this some thought.

Martin Morgan

> -Winston
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

This email message may contain legally privileged and/or...{{dropped:2}}