[Rd] list.files(., pattern=<>, recursive = TRUE, include.dirs = TRUE)

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Sat Dec 21 00:48:59 CET 2019


Hi all,

I ran into a weird corner-case of list.files today and I'm wondering what
people think about it and a potential wishlist enhancement related to it.

Consider the case where we call list.files with recursive and include.dirs both
TRUE and we supply a pattern. In this case pattern is  applied to directory
names when deciding whether to list the directory return value but NOT when
recursing. This behavior is consistent, but I'd argue its also
counterintuitive. If a directory is excluded for not matching pattern, I
wouldnt necessarily expect its children/contents to even be candidates for
inclusion at first blush.

If others agree this behavior is strange/suboptimal I figure there are a
few different things that can be done here (which I discuss below):

   1. Modify behavior list.files(., include.dirs=TRUE, recursive=TRUE,
   pattern=<>) so that
      1. pattern is applied when deciding where to recurse.
      2. all directories that (recursively) contain least one file (or
      *possibly* empty leaf subdirectory) that matches pattern are
      themselves included in the return value.
   2. Add a recurse.pattern argument to list.files  (and list.dirs probably)
    that is used to filter directories recursed into (ignored when
   recursive == FALSE) .
   3. Modify the documentation of list.files so it mentions this
   inconsistency so that at least this behavior is documented, even if its
   (arguably) not ideal

My thoughts:

Both *1.1* and *1.2* are breaking changes, though I suspect that setting
include.dirs and recursive both to TRUE, (or, in fact setting include.dirs
to TRUE and having a pattern) is probably relatively rare. *1.1* is a more
drastic change but in my opinion ultimately more intuitive than *1.2*

I think *2 *could be useful, though only if the pattern would actually be
the same at different steps of recursion often enough in practice
(sometimes but not always, I'd think). *2* would fully backwards
compatible (computing on formals lists not withstanding...)  unless its
default was set to pattern when include.dirs is TRUE, in which case it
would be a disable-able implementation of *1.1*

I think *3* would be good to do if there's no appetite for doing anything
higher on the list.

I am happy to submit patches  (as wishlist items , except for *3*) for any
of the above if there is interest.

Thoughts?
~G

td = file.path(tempdir(), "listfilestst")
dns = c("good", "bad" )

dpths = file.path(td, as.vector(outer(dns, dns, paste, sep =
.Platform$file.sep)))
invisible(lapply(dpths, dir.create, recursive = TRUE))
fpths = as.vector(outer(dpths, c("goodfil", "badfil"), file.path))
invisible(sapply(fpths, function(pth) cat(" ", file = pth)))

## all files(/+dirs)

list.files(td, recursive = TRUE)
## [1] "bad/bad/badfil"    "bad/bad/goodfil"   "bad/good/badfil"
## [4] "bad/good/goodfil"  "good/bad/badfil"   "good/bad/goodfil"
## [7] "good/good/badfil"  "good/good/goodfil"
list.files(td, recursive = TRUE, include.dirs = TRUE)
##  [1] "bad"               "bad/bad"           "bad/bad/badfil"
##  [4] "bad/bad/goodfil"   "bad/good"          "bad/good/badfil"
##  [7] "bad/good/goodfil"  "good"              "good/bad"
## [10] "good/bad/badfil"   "good/bad/goodfil"  "good/good"
## [13] "good/good/badfil"  "good/good/goodfil"


## no b files
list.files(td, recursive = TRUE, pattern = "^[^b]+$")
## [1] "bad/bad/goodfil"   "bad/good/goodfil"  "good/bad/goodfil"
## [4] "good/good/goodfil"

## no b files include.dirs=TRUE
## bad is not included but bad/good is (both are directories)
## bad/bad/goodfil is also included
list.files(td, recursive = TRUE, pattern = "^[^b]+$", include.dirs = TRUE)
## [1] "bad/bad/goodfil"   "bad/good"          "bad/good/goodfil"
## [4] "good"              "good/bad/goodfil"  "good/good"
## [7] "good/good/goodfil"

	[[alternative HTML version deleted]]



More information about the R-devel mailing list