[Rd] CRAN indices out of whack (for at least macOS)

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Feb 5 11:31:45 CET 2018


Another benefit of Winston's proposal is that it make it easy to
install specific package versions from source. For the time being I'm
using a construct like
https://github.com/inbo/Rstable/blob/master/cran_install.sh to
generate a Docker image.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




2018-02-03 20:31 GMT+01:00 Winston Chang <winstonchang1 at gmail.com>:
> Although it may not have been the cause of this particular index
> inconsistency, there are other causes of intermittent index
> inconsistencies. They could be avoided if there were a different
> directory structure on CRAN servers.
>
> One of the causes of inconsistencies is caching. With
> cloud.r-project.org (note that this is not cran.r-project.org), the
> there is a CDN in front of the server; the CDN has caching endpoints
> around the world, and will serve files to the user from the nearest
> endpoint.
>
> The cache timeout for each file is 30 minutes. Suppose a user
> downloads file X from some endpoint at 1:00. If the endpoint doesn't
> already have X in the cache, then it will fetch the file from the
> server, and then send it to the user. The endpoint will consider the
> cached file valid until 1:30. If another user requests X at 1:20, the
> endpoint will serve up the file from its cache without checking with
> the server. If someone requests X at 1:40, the endpoint will check
> with the server to see if its cached version is still valid (and
> download an updated version if necessary), then it wills end the file
> to the user.
>
> Because the caching is on a per-file basis, this can lead to a
> situation where the PACKAGES file served by an endpoint is out of sync
> with the .tgz package files. Imagine this scenario:
>
> 1:00 Someone downloads PACKAGES. It is not yet in the endpoint's
> cache, so it fetches it from the server. This version of PACKAGES says
> that the current version of PkgA is 1.0.
> 1:10 The server performs an rsync from the central CRAN mirror. It
> gets an updated version of PACKAGES, which says that the current
> version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file
> and adds PkgA_2.0.tgz.
> 1:20 Someone else wants to install PkgA, so their R session first
> downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to
> download PkgA_1.0.tgz; it is not in the endpoint's cache, so the
> endpoint tries to fetch it from the server, but the file is not
> present there so it sends a 404 missing message. The endpoint passes
> this to the R session, and the package installation fails.
>
> Anyone else who tries to install PkgA (and hits the same CDN endpoint)
> will get the same installation failure, until the cache for PACKAGES
> expires at 1:30. However, another person who happens to hit another
> endpoint may be able to install PkgA, because each endpoint does its
> caching independently.
>
> Something similar even without a CDN, because download.packages()
> caches the contents of PACKAGES. However, that can be worked around by
> telling download.packages() to not use the cache, or by simply
> restarting R.
>
> One reason that package installations fail in these cases is that the
> current version of a package is in one directory, and the old
> (archived) versions of a package are in another directory. If current
> and old versions were in the same directory, then package installation
> would not fail.
>
>
> -Winston
>
>
>
> On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>>
>> I have received three distinct (non-)bug reports where someone claimed a
>> recent package of mine was broken ... simply because the macOS binary was not
>> there.
>>
>> Is there something wrong with the cronjob providing the indices? Why is it
>> pointing people to binaries that do not exist?
>>
>> Concretely, file
>>
>>   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
>>
>> contains
>>
>>   Package: digest
>>   Version: 0.6.15
>>   Title: Create Compact Hash Digests of R Objects
>>   Depends: R (>= 2.4.1)
>>   Suggests: knitr, rmarkdown
>>   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
>>   Archs: digest.so.dSYM
>>
>> yet the _same directory_ only has:
>>
>>   digest_0.6.14.tgz     15-Jan-2018 21:36       157K
>>
>> I presume this is a temporary accident.
>>
>> We are all spoiled by you all providing such a wonderfully robust and
>> well-oiled service---so again big THANKS for that--but today something is out
>> of order.
>>
>> Dirk
>>
>> --
>> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list