[Rd] Connections to https: URLs -- IE expert help needed

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jan 5 11:00:09 CET 2007


On Mon, 1 Jan 2007, Duncan Temple Lang wrote:

> Kurt Hornik wrote:
>>>>>>> Duncan Temple Lang writes:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>
>>> Prof Brian Ripley wrote:
>>>> I've added to R-devel the ability to use download.file() and url() to
>>>> https: URLs, *only* if --internet2 is used on Windows.
>>>>
>>>> This uses the Internet Explorer internals, and only works if the
>>>> certificate is accepted (so e.g. does not work for
>>>> https://svn.r-project.org).
>>>>
>>>> Now I use IE (and Windows for that matter) only when really necessary, and
>>>> Firefox has simple ways to permanently accept non-verifiable certificates.
>>>> I would be grateful if someone who is much more familiar with IE could
>>>> write a note explaining how to deal with this that we could add to the
>>>> rw-FAQ.
>>>>
>>>> To forestall the inevitable question: there are no plans to add https:
>>>> support on any other platform, but it is something that would make a nice
>>>> project for a user contribution.  The current internal code is based on
>>>> likxml2, and that AFAICS still does not have https: support.
>>>>
>>
>>> Generally (i.e. not in particular response to Brian but related to
>>> this thread)
>>
>> With a similar disclaimer: Brian's efforts were triggered by me asking
>> how to use url() to read R's mailing list archive files, such as
>>
>>   https://stat.ethz.ch/pipermail/r-help/2007-January.txt.gz
>>
>> directly into R.  Turns out we cannot ... which, in a way, is a shame
>> ("R cannot read its own web pages") :-(
>
> Indeed, it is a shame.  Although, when I process mail messages,
> I use Perl's very rich collection of modules for processing
> mail in so many different formats. And then I use RSPerl
> to control this and get the data into R pretty quickly.
> So we can do it in R and probably the delegation to
> mail-processing software is a good given the number of special
> cases, etc.
>
> And even if we had HTTPs in R, we would still want to deal with
> the certificate on that page, which gets us to more details.
> Which is the reason I think leaving things to libcurl,
> libwww, etc. will be best as they continue to evolve
> to handle new protocols and settings.

The issue here is the same as it ever was, that of event-loops and not 
blocking the R process.  I think that is where the missing extensibility 
is, and it has been raised for at least 6 years now.

If I try to get that example URI with RCurl it

1) blocks the R process for a long time.
2) fails to retrieve the URI as it is unable to handle the certificate.

Can you please point us to an extension package that behaves better?

[When Kurt first sent me the example, I was surprised that wget handled 
it. I then checked, and wget < 1.10 does not check certificates at all.]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list