[Rd] Comments requested on "changedFiles" function

Scott Kostyshak skostysh at princeton.edu
Mon Sep 9 02:31:07 CEST 2013


On Sun, Sep 8, 2013 at 10:55 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
> On 13-09-06 11:07 PM, Karl Millar wrote:
>>
>> On Fri, Sep 6, 2013 at 7:03 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
>> wrote:
>>>
>>> On 13-09-06 9:21 PM, Karl Millar wrote:
>>>>
>>>>
>>>> Hi Duncan,
>>>>
>>>> I like the interface of this version a lot better, but there's still a
>>>> bunch of implementation details that need fixing:
>>>>
>>>> * As previously mentioned, there are important cases where the mtime
>>>> values change in ways that this code doesn't detect.
>>>> * If the timestamp file (which is usually in the temp directory) gets
>>>> deleted (which can happen after a moderate amount of time of
>>>> inactivity on some systems), then the file_test('-nt', ...) will
>>>> always return false, even if the file has changed.
>>>
>>>
>>>
>>> If that happened without user intervention, I think it would break other
>>> things in R -- the temp directory is supposed to last for the whole
>>> session.
>>> But I should be checking anyway.
>>
>>
>> Yes, it does break other things in R -- my experience has been that
>> the help system seems to be the one that is impacted the most by this.
>>   FWIW, I've never seen the entire R temp directory deleted, just
>> individual files and subdirectories in it, but even that probably
>> depends on how the machine is configured.  I suspect only a few users
>> ever notice this, but my R use is probably somewhat anomalous and I
>> think it only happens to R sessions that I haven't used for a few
>> days.
>
>
> I use Windows and never see this; deleting temp files is up to me, not to
> the system.  But my understanding was the *nix systems should only clean up
> /tmp on restart, and I don't think an R session will survive a restart.
>
> However, you have convinced me that the use of the timestamp file is not
> beneficial enough to be the default.  I'll leave it as an option, but add
> warnings that it might be unreliable.
>
>
>>
>>>> * If files get added or deleted between the two calls to list.files in
>>>> fileSnapshot, it will fail with an error.
>>>
>>>
>>>
>>> Yours won't work if path contains more than one directory.  This is
>>> probably
>>> a reasonable restriction, but it's inconsistent with list.files, so I'd
>>> like
>>> to avoid it if I can find a way.
>>
>>
>> I'm currently unsure what the behaviour when comparing snapshots with
>> multiple directories should be.
>>
>> Presumably we should have the property that (horribly abusing notation
>> for succinctness):
>>        compareSnapshots(c(a1, a2),  c(a1, a2))
>> is the same as concatenating (in some form)
>>        compareSnapshots(a1, a1) and compareSnapshots(a2, a2)
>> and there's a bunch of ways we could concatenate -- we could return a
>> list of results, or a single result where each of the 'added, deleted,
>> modified' fields are a list, or where we concatenate the 'added,
>> deleted, modified' fields together into three simple vectors.
>> Concatenating the vectors together like this is appealing, but unless
>> you're using the full names, it doesn't include the information of
>> which directory the changes are in, and using the full names doesn't
>> work in the case where you're comparing different sets of directories,
>> e.g. compareSnapshots(c(a1, a2), c(b1, b2)), where there is no
>> sensible choice for a full name.  The list options don't have this
>> problem, but are harder to work with, particularly for the common case
>> where there's only a single directory.  You'd also have to be somewhat
>> careful with filenames that occur in both directories.
>>
>> Maybe I'm just being dense, but I don't see a way to do this thats
>> clear, easy to use and wouldn't confuse users at the moment.
>
>
> The way I've done this is to require full.names when multiple dirs are on
> the path.  I've reduced it to one list.files() call per dir, by iterating
> over the path variable and using your approach of calling it with full.names
> = FALSE, then adding the dir if necessary.
>
> I haven't adopted your change that forces comparison of only size and mtime
> from file.info.  I don't see a big cost in storing whatever file.info
> returns (which is system dependent; on Windows I don't see the user and
> group related columns; on Unix I don't see the exe column).
> Users might want to detect changes to anything there, and I shouldn't make
> it harder for them.
>
> I've also kept the special-casing of md5sum; it really needs to be wrapped
> in suppressWarnings() (on Unix only).  And I've kept the options to specify
> what changedFiles checks among the file.info columns; I can see that you
> might want a snapshot with everything, but sometimes only want to be told
> about changes in a subset of the attributes.
>
> I've uploaded
> <http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.1.tar.gz> if anyone
> is interested.

Works well.

Scott


--
Scott Kostyshak
Economics PhD Candidate
Princeton University



More information about the R-devel mailing list