[Rd] WISH: Built-in R session-specific universally unique identifier (UUID)

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Tue May 21 01:48:08 CEST 2019


# Proposal

Provide a built-in mechanism for obtaining an identifier for the
current R session, e.g.

> Sys.info()[["session_uuid"]]
[1] "4258db4d-d4fb-46b3-a214-8c762b99a443"

The identifier should be "unique" in the sense that the probability
for two R sessions(*) having the same identifier should be extremely
small.  There's no need for reproducibility, i.e. the algorithm for
producing the identifier may be changed at any time.

(*) Two R sessions running at different times (seconds, minutes, days,
years, ...) or on different machines (locally or anywhere in the
world).


# Use cases

In parallel-processing workflows, R objects may be "exported"
(serialized) to background R processes ("workers") for further
processing.  In other workflows, objects may be saved to file to be
reloaded in a future R session.  However, certain types of objects in
R maybe only be relevant, or valid, in the R session that created
them.  Attempts to use them in other R processes may give an obscure
error or in the worst case produce garbage results.

Having an identifier that is unique to each R process will make it
possible to detect when an object is used in the wrong context.  This
can be done by attaching the session identifier to the object.  For
example,

obj <- 42L
attr(obj, "owner") <- Sys.info()[["session_uuid"]]

With this, it is easy to validate the "ownership" later;

stopifnot(identical(attr(obj, "owner"), Sys.info()[["session_uuid"]]))

I argue that such an identifier should be part of base R for easy
access and avoid each developer having to roll their own.


# Possible implementation

One proposal would be to bring in Simon Urbanek's 'uuid' package
(https://cran.r-project.org/package=uuid) into base R.  This package
provides:

> uuid::UUIDgenerate()
[1] "b7de6182-c9c1-47a8-b5cd-e5c8307a8efb"

based on Theodore Ts'o's libuuid
(https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/).  From
'man uuid_generate':

"The uuid_generate function creates a new universally unique
identifier (UUID). The uuid will be generated based on high-quality
randomness from /dev/urandom, if available. If it is not available,
then uuid_generate will use an alternative algorithm which uses the
current time, the local ethernet MAC address (if available), and
random data generated using a pseudo-random generator.
[...]
The UUID is 16 bytes (128 bits) long, which gives approximately
3.4x10^38 unique values (there are approximately 10^80 elementary
particles in the universe according to Carl Sagan's Cosmos). The new
UUID can reasonably be considered unique among all UUIDs created on
the local system, and among UUIDs created on other systems in the past
and in the future."

An alternative, that does not require adding a dependency on the
libuuid library, would be to roll a poor man's version based on a set
of semi-unique attributes, e.g.

make_id <- function(...) {
  args <- list(...)
  saveRDS(args, file = f <- tempfile())
  on.exit(file.remove(f))
  unname(tools::md5sum(f))
}

session_id <- local({
  id <- NULL
  function() {
    if (is.null(id)) {
      id <<- make_id(
        info    = Sys.info(),
        pid     = Sys.getpid(),
        tempdir = tempdir(),
        time    = Sys.time(),
        random  = sample.int(.Machine$integer.max, size = 1L)
      )
    }
    id
  }
})

Example:

> session_id()
[1] "8d00b17384e69e7c9ecee47e0426b2a5"

> session_id()
[1] "8d00b17384e69e7c9ecee47e0426b2a5"

/Henrik

PS. Having a built-in make_id() function would be handy too, e.g. when
creating object-specific identifiers for other purposes.

PPS. It would be neat if there was an object, or connection, interface
for tools::md5sum(), which currently only operates on files sitting on
the file system. The digest package provides this functionality.



More information about the R-devel mailing list