[Rd] system/system2 and open file descriptors

Winston Chang winstonchang1 at gmail.com
Thu Apr 20 05:40:41 CEST 2017

In addition to the issue of a child process holding onto open files, the
child process can also manipulate a file descriptor in a way that affects
the parent process. For example, calling lseek() in the child process will
move the file offset in the parent process.

Here is a set of commands that demonstrates it. They can be copied and
pasted in a terminal. What it does:
- Creates C program that seeks to the beginning of a file descriptor, and
compiles it to a program named "lseek".
- Creates a file with some text in it.
- Starts R. In R:
    - Opens the text file and reads the first line.
    - Runs lseek in a child process.
    - Reads the rest of the lines.

echo "#include <unistd.h>
int main(void) {
  lseek(3, 0, SEEK_SET);
}" > lseek.c

gcc lseek.c -o lseek

echo "line 1
line 2
line 3" > lines.txt

f <- file('lines.txt', 'r')
cat(readLines(f, n = 1), sep = "\n")
cat(readLines(f), sep = "\n")

Here's what it outputs:
> f <- file('lines.txt', 'r')
> cat(readLines(f, n = 1), sep = "\n")
line 1
> system('./lseek')
> cat(readLines(f), sep = "\n")
line 2
line 3
line 1
line 2
line 3

The child process has changed what the parent process reads from the file.
(I'm guessing that the reason readLines() prints out "line 2" and "line 3"
before starting over is because it has already buffered the whole file
before lseek is executed.)

This is obviously a highly contrived case, but it illustrates what's
possible. The other issue I mentioned, with child processes holding open
files after the R process exits, is more likely to cause problems in the
real world. That's actually how I encountered this issue in the first
place: when restarting R inside of RStudio on a Mac, if there are any
extant child processes started by system(), they keep some files open, and
this causes RStudio to hang. (There's a fix in progress for RStudio for
this particular issue.)


On Tue, Apr 18, 2017 at 3:20 PM, Winston Chang <winstonchang1 at gmail.com>

> It seems that the system() and system2() functions don't close file
> descriptors between the fork() and exec() (on Unix platforms, of course).
> This means that the child processes inherit open files and socket
> connections.
> Running this (from a terminal) will result in the child process writing to
> a file that was opened by R:
> R
> f <- file('foo.txt', 'w')
> system('echo "abc" >&3')
> You can also see the open files if you run the following:
>   f <- file('foo.txt', 'w')
>   system2('sleep', '100', wait=F)
> And then in another terminal:
>   lsof -c R -c sleep
> it will show that both the R and sleep processes have the file open:
>   ...
>   R       324 root    3w   REG   0,48        0   4259 /foo.txt
>   ...
>   sleep   327 root    3w   REG   0,48        0   4259 /foo.txt
> This behavior can cause problems if R spawns a child process that outlives
> the R process, but keeps open some resources.
> Would it be possible to add an option to close file descriptors for child
> processes? It would be nice if that were the default, but I suspect that
> making that change would break a lot of existing code.
> To take an example from the Python world, subprocess.Popen() has an
> option, close_fds, which closes all file descriptors except 0, 1, and 2.
>   https://docs.python.org/2/library/subprocess.html#popen-constructor
> -Winston

	[[alternative HTML version deleted]]

More information about the R-devel mailing list