[Rd] Pipe / Fork: Partial Solution / Providing Connections from C?

Jan T. Kim jtk at cmp.uea.ac.uk
Fri Feb 11 21:42:43 CET 2005

On Fri, Feb 11, 2005 at 02:32:20PM +0100, Peter Dalgaard wrote:
> "Jan T. Kim" <jtk at cmp.uea.ac.uk> writes:
> > > Well, that is probably reasonably easy, but (not the least due to that
> > > fact) I'm still surprised that it has not been done already. I can hardly
> > > imagine that I'm the first one to want to use some external utility from
> > > an R program in this way.
> > > 
> > > So, what do you R-devel folks do in this case, and what would you
> > > recommend?
> > 
> > I'm still curious about this one. If there really is no way of running
> > stuff through external filter processes in R, I'd volunteer to add
> > that.
> > 
> > Best regards & thanks in advance, Jan
> If you know how, please do. I have a suspicion it might not be as easy
> as it sounds because of the producer/consumer aspects. Notice, though,
> that in most cases you can get by with system() or pipe() and a
> temporary file for either the input or the output.

Personally, I see filtering as a process, and the sequence of collecting
input in a file, then filtering that into an output file, then reading
that and carrying on with it as a more complex process that involves
filtering as a part of it. Additional complexity means that there's more
that can go wrong, which is why I dislike temporary files.

Specifically.  I've seen it happen too often (including to myself) that
things went wrong because other processes were interfering with the
temporary files (in most cases, other processes running the same program).

> I remember speculating about these matters when I was first introduced
> to pipes in C: They'd show you how to open a pipe for reading and how
> to do it for writing, but not how to do both with the same process.
> Took me a while to realize that there is a nontrivial deadlock issue
> if you try to write to a process that itself is blocked trying to
> write its output. Now that is of course not to say that it cannot be
> done with clever multiplexing and buffering techniques -- or
> multithreading, except that R isn't threaded.

It's clear to me that for real dynamic filtering, you need two processes
(or threads). This requires that the operating system supports forking,
i.e. that the fork package works. Without that, filtering is not
possible, at least I'm not in any way I'm aware of.

So, my plan would be to add some function to src/main/connections.c for
setting up a pipe running through an external command and returning the
write and read connections for use in the R program. Then, one could do
something like (modelled after the pipe example in the base docs):

    data2 <- c(
      "450, 390, 467, 654,  30, 542, 334, 432, 421,",
      "357, 497, 493, 550, 549, 467, 575, 578, 342,",
      "446, 547, 534, 495, 979, 479");
    fp <- filterpipe("sed -e s/,$//");
      pid <- fork(slave = NULL)
      if (pid == 0)
	write(data2, file = fp$write);
	x <- scan(fp$read);

Thinking about your buffering suggestion, it occurs to me that it *may*
be possible to create two anonymous files (of the file("") type) and
to connect these to the stdin and the stdout of an external process.
In fact, a couple of days ago I checked whether pipe() would perhaps
accept optional file arguments for specifying the external process'
stdin and stdout, so I could e.g.

    f <- file("");
    p <- pipe("sed -e s/,$//", stdin = f);
    write(data2, file = f);

but that turned out to be another detour on the way that took me here...

Best regards, Jan
 +- Jan T. Kim -------------------------------------------------------+
 |    *NEW*    email: jtk at cmp.uea.ac.uk                               |
 |    *NEW*    WWW:   http://www.cmp.uea.ac.uk/people/jtk             |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

More information about the R-devel mailing list