[Rd] Question about Unix file paths (and proposal for new regexp class)

Gabor Grothendieck ggrothendieck at myway.com
Wed Nov 26 15:37:45 MET 2003


> Date: Wed, 26 Nov 2003 13:52:44 +0100 
> From: Martin Maechler <maechler at stat.math.ethz.ch>
> To: <Kurt.Hornik at wu-wien.ac.at> 
> Cc: <r-devel at stat.math.ethz.ch> 
> Subject: Re: [Rd] Question about Unix file paths 
> 
>  
>  
> >>>>> " Kurt" == Kurt Hornik <Kurt.Hornik at wu-wien.ac.at>
> >>>>> on Wed, 26 Nov 2003 10:05:42 +0100 writes:
> 
> >>>>> Prof Brian Ripley writes:
> >> On Mon, 24 Nov 2003, Duncan Murdoch wrote:
> >>> >Duncan Murdoch <dmurdoch at pair.com> writes:
> >>> >
> >>> >> Gabor Grothendieck pointed out a bug to me in
> >>> list.files(..., >> full.name=TRUE), that essentially
> >>> comes down to the fact that in >> Windows it's not
> >>> always valid to add a path separator (slash or >>
> >>> backslash) between a path specifier and a filename. For
> >>> example,
> >>> >> 
> >>> >> c:foo
> >>> >> 
> >>> >> is different from
> >>> >> 
> >>> >> c:\foo
> >>> >> 
> >>> >> and there are other examples.
> >>> 
> >>> I've committed a change to r-patched to fix this in
> >>> Windows only. Sounds like it's not an issue elsewhere.
> 
> >> I think there are some potential issues with doubling
> >> separators and final separators on dirs. On Unix file
> >> systems /part1//part2 and /path/to/dir/ are valid.
> >> However, file systems on Unix may not be Unix file
> >> systems: examples are earlier MacOS systems on MacOS X
> >> and mounted Windows and Novell systems on Linux. I would
> >> not want to assume that all of these combinations worked.
> 
> >>> Gabor also suggested an option to use shell globbing
> >>> instead of regular expressions to select the files in
> >>> the list, e.g.
> >>> 
> >>> list.files(dir="/", pattern="a*.dat", glob=T)
> >>> 
> >>> This would be easy to do in Windows, but from the little
> >>> I know about Unix programming, would not be so easy
> >>> there, so I haven't done anything about it.
> 
> >> It would be shell-dependent and OS-dependent as well as a
> >> retrograde step, as those who wanted to use regular
> >> expressions no longer would be able to.
> 
> Kurt> Right. In any case, an explicit glob() function
> Kurt> seems preferable to me ...
> 
> Good idea!
> 
> More than 12 years ago, I had a similar one, and wrote a
> "pat2grep()" {pattern to grep regular expression} function
> --- for S-plus on Unix --- which I have now renamed to glob2regexp():
> -- still not really usable outside unix (or windows with the
> 'sed' tool in the path), nor perfect, but maybe a good start:
> 
> sys <- function(...) system(paste(..., sep = ""))
> 
> glob2regexp <- function(pattern)
> {
> ## Purpose: Change "ls pattern" to "grep regular expression" pattern.
> ## -------------------------------------------------------------------------
> ## Author: Martin Maechler ETH Zurich, ~ 1991
> sys("echo '", pattern, "'| sed ",
> "'s/\\./\\\\./g;s/*/.*/g;s/?/./g; s/^/^/;s/$/$/; s/\\.\\*\\$$//'")
> }
> 
> E.g.,
> 
> > glob2regexp("a*.dat")
> ^a.*\.dat$
> 
> > pat2grep("a?bc*.t??")
> ^a.bc.*\.t..$
> 
> and one could use it as
> 
> list.files(...., pattern = glob2regexp("a*.dat"))
> 
> Of course, the function needs to be changed to simply use things like
> sub() and gsub() --- another minor exercise for our audience ...
> 
> Martin

This is quite nifty.  One advantage is that glob2regexp does not
need to know the directory.

Perhaps what is needed is a regexp class which stores the type of 
regexp in the object itself: basic, extended, perl or glob.  This
would clean up and unify various extra arguments floating around 
in a number of functions.



More information about the R-devel mailing list