[Rd] CRAN indices out of whack (for at least macOS)

Winston Chang winstonchang1 at gmail.com
Sat Feb 3 20:31:47 CET 2018


Although it may not have been the cause of this particular index
inconsistency, there are other causes of intermittent index
inconsistencies. They could be avoided if there were a different
directory structure on CRAN servers.

One of the causes of inconsistencies is caching. With
cloud.r-project.org (note that this is not cran.r-project.org), the
there is a CDN in front of the server; the CDN has caching endpoints
around the world, and will serve files to the user from the nearest
endpoint.

The cache timeout for each file is 30 minutes. Suppose a user
downloads file X from some endpoint at 1:00. If the endpoint doesn't
already have X in the cache, then it will fetch the file from the
server, and then send it to the user. The endpoint will consider the
cached file valid until 1:30. If another user requests X at 1:20, the
endpoint will serve up the file from its cache without checking with
the server. If someone requests X at 1:40, the endpoint will check
with the server to see if its cached version is still valid (and
download an updated version if necessary), then it wills end the file
to the user.

Because the caching is on a per-file basis, this can lead to a
situation where the PACKAGES file served by an endpoint is out of sync
with the .tgz package files. Imagine this scenario:

1:00 Someone downloads PACKAGES. It is not yet in the endpoint's
cache, so it fetches it from the server. This version of PACKAGES says
that the current version of PkgA is 1.0.
1:10 The server performs an rsync from the central CRAN mirror. It
gets an updated version of PACKAGES, which says that the current
version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file
and adds PkgA_2.0.tgz.
1:20 Someone else wants to install PkgA, so their R session first
downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to
download PkgA_1.0.tgz; it is not in the endpoint's cache, so the
endpoint tries to fetch it from the server, but the file is not
present there so it sends a 404 missing message. The endpoint passes
this to the R session, and the package installation fails.

Anyone else who tries to install PkgA (and hits the same CDN endpoint)
will get the same installation failure, until the cache for PACKAGES
expires at 1:30. However, another person who happens to hit another
endpoint may be able to install PkgA, because each endpoint does its
caching independently.

Something similar even without a CDN, because download.packages()
caches the contents of PACKAGES. However, that can be worked around by
telling download.packages() to not use the cache, or by simply
restarting R.

One reason that package installations fail in these cases is that the
current version of a package is in one directory, and the old
(archived) versions of a package are in another directory. If current
and old versions were in the same directory, then package installation
would not fail.


-Winston



On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> I have received three distinct (non-)bug reports where someone claimed a
> recent package of mine was broken ... simply because the macOS binary was not
> there.
>
> Is there something wrong with the cronjob providing the indices? Why is it
> pointing people to binaries that do not exist?
>
> Concretely, file
>
>   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
>
> contains
>
>   Package: digest
>   Version: 0.6.15
>   Title: Create Compact Hash Digests of R Objects
>   Depends: R (>= 2.4.1)
>   Suggests: knitr, rmarkdown
>   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
>   Archs: digest.so.dSYM
>
> yet the _same directory_ only has:
>
>   digest_0.6.14.tgz     15-Jan-2018 21:36       157K
>
> I presume this is a temporary accident.
>
> We are all spoiled by you all providing such a wonderfully robust and
> well-oiled service---so again big THANKS for that--but today something is out
> of order.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list