[R] system() and file names with spaces

Richard A. O'Keefe ok at cs.otago.ac.nz
Thu Dec 9 01:55:50 CET 2004


Consider the question we had recently:  "how do I count the lines in a file
without reading it into R?"  The solution I suggested was

    as.numeric(system(paste("wc -l <", filename), TRUE))

Unfortunately, it doesn't work, or at least, not all the time.
If you already know all about that, and don't care, or already have
a solution, stop reading now.  Otherwise, let me try to undo any
harm I may have done by providing a fuller solution.

We've had several reports in this list about problems caused by Windows
file names with spaces in them.  File names with spaces are also common
in MacOS X, so common, in fact, that file name completion in a Terminal
actually works (if you have a file name "Foo Bar", and type F, o, TAB
you get Foo\ Bar).  File names with spaces are possible in other Unix
systems too, and always have been, though they are less likely.

So suppose there is a file "Foo Bar" you want to find the size of.
> file.name <- "Foo Bar"
> system(paste("wc -l <", File.name)
executes the command
   wc -l < Foo Bar
which gives you the size of Bar if there is one, or fails if there is not,
and ignores Foo (should there be one) and of course ignores "Foo Bar".

What can we do about it?  Well, we can try this:

    for.system <- function (s) gsub(" ", "\\\\ ", s)

    system(paste("wc -l <", for.system(file.name)), TRUE)

Great.  Works for files with spaces in their names.  Now we try some other
file names.  (File names like this are abundant in MacOS X.)

    file.name <- "Black & White Minstrels/1972"

	Whoops.  wc -l < Black\ &\ White\ Minstrels/1972
	forks off "wc -l <Black\ " and then tries to run
	"\ White\ Minstrels/1972".

    file.name <- "Quake(R)/scores"

	Whoops.  "Badly placed ()'s".

    file.name <- "Drunkard's walk/log-1'

	Whoops.  "Unmatched '"

So try again.

    for.system <-
	function (s) gsub("([][)(}{'\";&! \t])", "\\\\\\1", s)

    line.count <-
	function (s) as.numeric(system(paste("wc -l <", for.system(s)), TRUE))

This _still_ isn't perfect, but it is a whole lot better than the naive
version.  The major remaining problem is that the set of special characters
and the quoting mechanism need to be changed for Windows.  I _think_ the
Windows version should be something like

    for.system <- function (s) {
	i <- grep("[^-_:.A-Za-z0-9/\\\\]", s)
	s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep=""))
	s
    }

But what if a file name contains a double quote?  Until someone tells me,
I'm just going to hope it doesn't happen.  Putting the pieces together,

f% cat >"Foo Bar"
a b c
d e
f
<EOF>


for.system <-
    if (.Platform$OS.type == "windows") {
        function (s) {
            i <- grep("[^-_:.A-Za-z0-9/\\\\]", s)
            s[i] <- sapply(s[i], function (s) paste("\"", s, "\"", sep=""))
            s
        }
    } else {
        function (s) gsub("([][)(}{'\";&! \t\n])", "\\\\\\1", s)
    }

wc <- function (s) {
    r <- scan(pipe(paste("wc <", for.system(s)), open="r"), n=3, quiet=TRUE)
    names(r) <- c("lines", "words", "chars")
    r
}

> wc("Foo Bar")
lines words chars 
    3     6    12 
> system("cp $HOME/.login Drunkard\\'s\\ Walk")
> wc("Drunkard's Walk")["chars"]
chars 
 3633 
> 

If there's already something like for.system() built into R, I'd be very
happy to know about it.  (It's a little odd that system() and pipe()
don't already support something like this; in a multi-element character
vector the first could be taken literally and the remaining ones could be
taken quoted with leading spaces.)




More information about the R-help mailing list