[Rd] tempdir() may be deleted during long-running R session

Duncan Murdoch murdoch.duncan at gmail.com
Wed Apr 26 19:47:16 CEST 2017


On 26/04/2017 10:39 AM, Tomas Kalibera wrote:
>
> I agree this should be solved in configuration of
> systemd/tmpreaper/whatever tmp cleaner - the cleanup must be prevented
> in configuration files of these tools. Moving session directories under
> /var/run (XDG_RUNTIME_DIR) does not seem like a good solution to me,
> sooner or later someone might come with auto-cleaning that directory too.
>
> It might still be useful if R could sometimes detect when automated
> cleanup happened and warn the user. Perhaps a simple way could be to
> always create an empty file inside session directory, like
> ".tmp_cleaner_trap". R would never touch this file, but check its
> existence time-to-time. If it gets deleted, R would issue a warning and
> ask the user to check tmp cleaner configuration. The idea is that this
> file will be the oldest one in the session directory, so would get
> cleaned up first.

Yes, I like that idea, as long as checking for its existence doesn't 
make some system think it is in use and therefore protected from deletion.

Duncan Murdoch

>
> Tomas
>
>
> On 04/26/2017 02:29 PM, Duncan Murdoch wrote:
>> On 26/04/2017 4:21 AM, Martin Maechler wrote:
>>>>>>>>   <frederik at ofb.net>
>>>>>>>>     on Tue, 25 Apr 2017 21:13:59 -0700 writes:
>>>
>>>     > On Tue, Apr 25, 2017 at 02:41:58PM +0000, Cook, Malcolm wrote:
>>>     >> Might this combination serve the purpose:
>>>     >> * R session keeps an open handle on the tempdir it creates,
>>>     >> * whatever tempdir harvesting cron job the user has be made
>>> sensitive enough not to delete open files (including open directories)
>>>
>>> I also agree that the above would be ideal - if possible.
>>>
>>>     > Good suggestion but doesn't work with the (increasingly popular)
>>>     > "Systemd":
>>>
>>>     > $ mkdir /tmp/somedir
>>>     > $ touch -d "12 days ago" /tmp/somedir/
>>>     > $ cd /tmp/somedir/
>>>     > $ sudo systemd-tmpfiles --clean
>>>     > $ ls /tmp/somedir/
>>>     > ls: cannot access '/tmp/somedir/': No such file or directory
>>>
>>> Some thing like your example is what I'd expect is always a
>>> possibility on some platforms, all of course depending on low
>>> things such as  root/syadmin/...  "permission" to clean up etc.
>>>
>>> Jeroeen mentioned the fact that tempdir()s also can disappear
>>> for other reasons {his was multicore child processes
>>> .. bugously(?) implemented}.
>>> Further reasons may be race conditions / user code bugs / user
>>> errors, etc.
>>> Note that the R process which created the tempdir on startup
>>> always has the permission to remove it again.  But you can also
>>> think a full file system, etc.
>>>
>>> Current  R-devel's    tempdir(check = TRUE)   would create a new
>>> one or give an error (and then the user should be able to use
>>>     Sys.setenv("TEMPDIR" ...)
>>>     to a directory she has write-permission )
>>>
>>> Gabe's point of course is important too: If you have a long
>>> running process that uses a tempfile,
>>> and if  "big brother"  has removed the full tempdir() you will
>>> be "unhappy" in any case.
>>> Trying to prevent big brother from doing that in all cases seems
>>> "not easy" in any case.
>>>
>>> I did want to provide an easy solution to the OP situation:
>>> Suddenly tmpdir() is gone, and quite a few things stop working
>>> in the current R process {he mentioned  help(), e.g.}.
>>> With new   tmpdir(check=TRUE)  facility, code could be changed
>>> to replace
>>>
>>>    tempfile("foo")
>>>
>>> either by
>>>    tempfile("foo", tmpdir=tempdir(check=TRUE))
>>>
>>> or by something like
>>>
>>>    tryCatch(tempfile("foo"),
>>>              error = function(e)
>>>             tempfile("foo", tmpdir=tempdir(check=TRUE)))
>>>
>>> or be even more sophisticated.
>>>
>>> We could also consider allowing   check =  TRUE | NA | FALSE
>>>
>>> and make  NA  the default and have that correspond to
>>> check =TRUE  but additionally do the equivalent of
>>>    warning("tempdir() has become invalid and been recreated")
>>> in case the tempdir() had been invalid.
>>>
>>>     > I would advocate just changing 'tempfile()' so that it
>>> recreates the
>>>     > directory where the file is (the "dirname") before returning
>>> the file
>>>     > path. This would have fixed the issue I ran into. Changing
>>> 'tempdir()'
>>>     > to recreate the directory is another option.
>>>
>>> In the end I had decided that
>>>
>>>       tempfile("foo", tmpdir = tempdir(check = TRUE))
>>>
>>> is actually better self-documenting than
>>>
>>>       tempfile("foo", checkDir = TRUE)
>>>
>>> which was my first inclination.
>>>
>>> Note again that currently, the checking is _off_ by default.
>>> I've just provided a tool -- which was relatively easy and
>>> platform independent! --- to do more (real and thought)
>>> experiments.
>>
>> This seems like the wrong approach.  The problem occurs as soon as the
>> tempdir() gets cleaned up:  there could be information in temp files
>> that gets lost at that point.  So the solution should be to prevent
>> the cleanup, not to continue on after it has occurred (as "check =
>> TRUE" does).  This follows the principle that it's better for the
>> process to always die than to sometimes silently produce incorrect
>> results.
>>
>> Frederick posted the way to do this in systems using systemd.  We
>> should be putting that in place, or the equivalent on systems using
>> other tempfile cleanups.  This looks to me like something that "make
>> install" should do, or perhaps it should be done by people putting
>> together packages for specific systems.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list