[Rd] tools::md5sum(directory) behavior different on Windows vs. Unix

Scott Kostyshak skostysh at princeton.edu
Sun Sep 29 10:16:10 CEST 2013


On Mon, Sep 9, 2013 at 3:00 AM, Scott Kostyshak <skostysh at princeton.edu> wrote:
> tools::md5sum gives a warning if it receives a directory as an
> argument on Unix but not on Windows.
>
> From what I understand, this happens because in Windows a directory is
> not treated as a file so fopen returns NULL. Then, NA is returned
> without a warning. On Unix, a directory is treated as a file so fopen
> does not return NULL so md5 is run and fails, leading to a warning.
>
> This is a good opportunity for me to understand further (in addition
> to [1] and the many places where OS special cases are mentioned) in
> which cases R tries to behave the same on Windows as on Unix and in
> which cases it allows for differences (in this case, a warning vs. no
> warning). For example, it would be straightforward to create a patch
> that would lead to the same behavior in this case. tools::md5sum could
> either issue a warning for each argument that is a directory or it
> could issue no warning (consistent with file.info). Would either patch
> be considered?

Attached is a patch that gives a warning if an element in the file
argument is not a regular file (e.g. is a directory or does not
exist). In my opinion the advantages of this patch are:

(1) the same warnings are generated on all platforms in the case where
one of the elements is a folder.
(2) a warning is also given if a file does not exist.

Comments?

Scott

>
> Or is this difference encouraged because the concept of a file is
> different on Unix than on Windows?
>
> Scott
>
> [1] http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-should-I-expect-to-behave-differently-from-the-Unix-version
>
>
> --
> Scott Kostyshak
> Economics PhD Candidate
> Princeton University
-------------- next part --------------
Index: trunk/src/library/tools/R/md5.R
===================================================================
--- trunk/src/library/tools/R/md5.R	(revision 64011)
+++ trunk/src/library/tools/R/md5.R	(working copy)
@@ -17,7 +17,18 @@
 #  http://www.r-project.org/Licenses/
 
 md5sum <- function(files)
-    structure(.Call(Rmd5, files), names=files)
+{
+    reg_ <- file_test("-f", files)
+    regFiles <- files[reg_]
+    notReg <- files[!reg_]
+    if(!all(reg_))
+        warning("The following are not regular files: ",
+                paste(shQuote(notReg), collapse = " "))
+    names(files) <- files
+    files[!reg_] <- NA
+    files[reg_] <- .Call(Rmd5, regFiles)
+    files
+}
 
 .installMD5sums <- function(pkgDir, outDir = pkgDir)
 {
Index: trunk/src/library/tools/man/md5sum.Rd
===================================================================
--- trunk/src/library/tools/man/md5sum.Rd	(revision 64011)
+++ trunk/src/library/tools/man/md5sum.Rd	(working copy)
@@ -18,7 +18,8 @@
 \value{
   A character vector of the same length as \code{files}, with names
   equal to \code{files}. The elements
-  will be \code{NA} for non-existent or unreadable files, otherwise
+  will be \code{NA} for non-existent or unreadable files (in which case
+  a warning will be generated), otherwise
   a 32-character string of hexadecimal digits.
 
   On Windows all files are read in binary mode (as the \code{md5sum}


More information about the R-devel mailing list