[Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

Kevin Ushey kevinushey at gmail.com
Thu Aug 27 20:00:49 CEST 2015


Thanks for looking into this so promptly!

Should users expect the behaviour to be congruent across all of the
supported external programs (curl, wget) as well? E.g.

    URL <- "http://cran.rstudio.org/no/such/file/here.tar.gz"
    download <- function(file, method, ...)
      print(download.file(file, destfile = tempfile(), method = method, ...))

    download(URL, method = "internal") ## error
    download(URL, method = "curl") ## status code 0
    download(URL, method = "wget") ## warning (status code 8)
    download(URL, method = "libcurl") ## status code 0

It seems unfortunate that the behaviour differs across each method; at
least in my mind `download.file()` should be a unified interface that
tries to do the 'same thing' regardless of the chosen method.

FWIW, one can force 'curl' to fail on HTTP error codes (-f) and this
can be passed down by R, e.g.

    download(URL, method = "curl", extra = "-f") ## warning (status code 22)

but I still think this should be promoted to an error rather than a
warning. (Of course, changing that would imply a backwards
incompatible change; however, I think it would be the correct change).

(PS: I just tested r69197 and method = "libcurl" does indeed report an
error now in the above test case on my system [OS X]; thanks!)

Kevin


On Thu, Aug 27, 2015 at 10:27 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
> R-devel r69197 returns appropriate errors for the cases below; I know of a
> few rough edges
>
> - ftp error codes are not reported correctly
> - download.file creates destfile before discovering that http fails, leaving
> an empty file on disk
>
> and am happy to hear of more.
>
> Martin
>
>
> On 08/27/2015 08:46 AM, Jeroen Ooms wrote:
>>
>> On Thu, Aug 27, 2015 at 5:16 PM, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
>>>
>>> Probably I'm confused now...
>>> Both R-patched and R-devel give an error (after a *long* wait!)
>>> for
>>>         download.file("https://someserver.com/mydata.csv", "mydata.csv")
>>>
>>> So that problem is I think  solved now.
>>
>>
>> I'm sorry for the confusion, this was a hypothetical example.
>> Connection failures are different from http status errors. Below some
>> real examples of servers returning http errors. For each example the
>> "internal" method correctly raises an R error, whereas the "libcurl"
>> method does not.
>>
>> # File not found (404)
>> download.file("http://httpbin.org/data.csv", "data.csv", method =
>> "internal")
>> download.file("http://httpbin.org/data.csv", "data.csv", method =
>> "libcurl")
>> readLines(url("http://httpbin.org/data.csv", method = "internal"))
>> readLines(url("http://httpbin.org/data.csv", method = "libcurl"))
>>
>> # Unauthorized (401)
>> download.file("https://httpbin.org/basic-auth/user/passwd",
>> "data.csv", method = "internal")
>> download.file("https://httpbin.org/basic-auth/user/passwd",
>> "data.csv", method = "libcurl")
>> readLines(url("https://httpbin.org/basic-auth/user/passwd", method =
>> "internal"))
>> readLines(url("https://httpbin.org/basic-auth/user/passwd", method =
>> "libcurl"))
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list