[Rd] Comments requested on "changedFiles" function

Duncan Murdoch murdoch.duncan at gmail.com
Sat Sep 7 04:03:00 CEST 2013


On 13-09-06 9:21 PM, Karl Millar wrote:
> Hi Duncan,
>
> I like the interface of this version a lot better, but there's still a
> bunch of implementation details that need fixing:
>
> * As previously mentioned, there are important cases where the mtime
> values change in ways that this code doesn't detect.
> * If the timestamp file (which is usually in the temp directory) gets
> deleted (which can happen after a moderate amount of time of
> inactivity on some systems), then the file_test('-nt', ...) will
> always return false, even if the file has changed.

If that happened without user intervention, I think it would break other 
things in R -- the temp directory is supposed to last for the whole 
session.  But I should be checking anyway.

> * If files get added or deleted between the two calls to list.files in
> fileSnapshot, it will fail with an error.

Yours won't work if path contains more than one directory.  This is 
probably a reasonable restriction, but it's inconsistent with 
list.files, so I'd like to avoid it if I can find a way.

Duncan Murdoch

> * If the path is on a remote file system, tempdir is local, and
> there's significant clock skew, then you can get incorrect results.
>
> Unfortunately, these aren't just theoretical scenarios -- I've had the
> misfortune to run up against all of them in the past.
>
> I've attached code that's loosely based on your implementation that
> solves these problems AFAICT.  Alternatively, Hadley's code handles
> all of these correctly, with the exception that compare_state doesn't
> handle the case where safe_digest returns NA very well.
>
> Regards,
>
> Karl
>
> On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>> On 13-09-06 7:40 PM, Scott Kostyshak wrote:
>>>
>>> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
>>> wrote:
>>>>
>>>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote:
>>>>>
>>>>>
>>>>> I have now put the code into a temporary package for testing; if anyone
>>>>> is interested, for a few days it will be downloadable from
>>>>>
>>>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>>
>>>>
>>>>
>>>> Sorry, error in the URL.  It should be
>>>>
>>>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>
>>>
>>> Works well. A couple of things I noticed:
>>>
>>> (1)
>>> md5sum is being called on directories, which causes warnings. (If this
>>> is not viewed as undesirable, please ignore the rest of this comment.)
>>> Should this be the responsibility of the user (by passing arguments to
>>> list.files)? In the example, changing
>>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE)
>>> to
>>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE, include.dirs=FALSE,
>>> recursive=TRUE")
>>>
>>> gets rid of the warnings. But perhaps the user just wants to exclude
>>> directories for the md5sum calculations. This can't be controlled from
>>> fileSnapshot.
>>
>>
>> I don't see the warnings, I just get NA values.  I'll try to see why there's
>> a difference.  (One possibility is my platform (Windows); another is that
>> I'm generally testing in R-patched and R-devel rather than the 3.0.1 release
>> version.)  I would rather suppress the warnings than make the user avoid
>> them.
>>
>>
>>>
>>> Or, should the "if (md5sum)" chunk subset "fullnames" using file_test
>>> or file.info to exclude directories (and then fill in the directories
>>> with NA)?
>>>
>>> (2)
>>> If I run example(changedFiles) several times, sometimes I get:
>>>
>>> chngdF> changedFiles(snapshot)
>>> File changes:
>>>         mtime md5sum
>>> file2  TRUE   TRUE
>>>
>>> and other times I get:
>>>
>>> chngdF> changedFiles(snapshot)
>>> File changes:
>>>         md5sum
>>> file2   TRUE
>>>
>>> I wonder why.
>>
>>
>> Sometimes the example runs so quickly that the new version has exactly the
>> same modification time as the original.  That's the risk of the mtime check.
>> If you put a delay between, you'll get consistent results.
>>
>> Duncan Murdoch
>>
>>
>>>
>>> Scott
>>>
>>>> sessionInfo()
>>>
>>> R Under development (unstable) (2013-08-31 r63780)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] testpkg_1.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_3.1.0
>>>>
>>>>
>>>
>>>
>>> --
>>> Scott Kostyshak
>>> Economics PhD Candidate
>>> Princeton University
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list