[Rd] list_files() memory corruption?

Seth Falcon seth at userprimary.net
Sat Mar 20 22:03:18 CET 2010


On 3/20/10 1:36 PM, Alistair Gee wrote:
> I fixed my build problems. I also noticed that my patch wasn't
> correct, so I have attached a new version.
> 
> This fix still grows the vector by doubling it until it is big enough,
> but the length is reset to the correct size at the end once it is
> known.
> 
> This fix differs from the existing fix in subversion in the following scenario:
> 
> 1.Create file Z in directory with 1 other file named Y
> 2. Call dir() to retrieve list of files.
> 3. dir() counts 2 files.
> 4. While dir() is executing, some other process creates file X in the directory.
> 5. dir() retrieves the list of files, stopping after 2 files. But by
> chance, it retrieves files X and Y (but not Z).
> 6. dir() returns files X and Y, which could be misinterpreted to mean
> that file Z does not exist.
> 
> In contrast, with the attached fix, dir() would return all 3 files.

I think the scenario you describe could happen with either version.
Once you've read the files in a directory, all bets are off.  Anything
could happen between the time you readdir() and return results back to
the user.  I agree, though, that avoiding two calls to readdir narrows
the window.

> Also, the existing fix in subversion doesn't seem to handle the case
> where readdir() returns fewer files than was originally counted as it
> doesn't decrease the length of the vector.

Yes, that's a limitation of the current fix.

Have you run 'make check-devel' with your patch applied?  Have you run
any simple tests for using dir() or list.files() with recursive=TRUE on
a reasonably large directory and compared times and memory use reported
by gc()?

It is often the case that writing the patch is the easy/quick part and
making sure that one hasn't introduced new infelicities or unintended
behavior is the hard part.

I will try to take another look at your latest patch.

+ seth



More information about the R-devel mailing list