[Rd] Request: tools::md5sum should accept connections and finally in-memory objects

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri May 1 23:35:48 CEST 2020


The tools package is not for users, it's for functions that R uses in 
installing packages, checking them, etc.  If you want a function for 
users, it would belong in utils.  But what's wrong with the digest 
package?  What's the argument that R Core should take this on?

Duncan Murdoch

On 01/05/2020 5:00 p.m., Dénes Tóth wrote:
> 
> AFAIK there is no hashing utility in base R which can create hash
> digests of arbitrary R objects. However, as also described by Henrik
> Bengtsson in [1], we have tools::md5sum() which calculates MD5 hashes of
> files. Calculating hashes of in-memory objects is a very common task in
> several areas, as demonstrated by the popularity of the 'digest' package
> (~850.000 downloads/month).
> 
> Upon the inspection of the relevant files in the R-source (e.g., [2] and
> [3]), it seems all building blocks have already been implemented so that
> hashing should not be restricted to files. I would like to ask:
> 
> 1) Why is md5_buffer unused?:
> In src/library/tools/src/md5.c [see 2], md5_buffer is implemented which
> seems to be the counterpart of md5_stream for non-file inputs:
> 
> ---
> #ifdef UNUSED
> /* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
>      result is always in little endian byte order, so that a byte-wise
>      output yields to the wanted ASCII representation of the message
>      digest.  */
> static void *
> md5_buffer (const char *buffer, size_t len, void *resblock)
> {
>     struct md5_ctx ctx;
> 
>     /* Initialize the computation context.  */
>     md5_init_ctx (&ctx);
> 
>     /* Process whole buffer but last len % 64 bytes.  */
>     md5_process_bytes (buffer, len, &ctx);
> 
>     /* Put result in desired memory area.  */
>     return md5_finish_ctx (&ctx, resblock);
> }
> #endif
> ---
> 
> 2) How can the R-community help so that this feature becomes available
> in package 'tools'?
> 
> Suggestions:
> As a first step, it would be great if tools::md5sum would support
> connections (credit goes to Henrik for the idea). E.g., instead of the
> signature tools::md5sum(files), we could have tools::md5sum(files, conn
> = NULL), which would allow:
> 
> x <- runif(10)
> tools::md5sum(conn = rawConnection(serialize(x, NULL)))
> 
> To avoid the inconsistency between 'files' (which computes the hash
> digests in a vectorized manner, that is, one for each file) and 'conn'
> (which expects a single connection), and to make it easier to extend the
> hashing for other algorithms without changing the main R interface, a
> more involved solution would be to introduce tools::hash and
> tools::hashes, in a similar vein to digest::digest and digest::getVDigest.
> 
> Regards,
> Denes
> 
> 
> [1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21
> [2]:
> https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172
> [3]:
> https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list