[Rd] list_files() memory corruption?

Alistair Gee alistair.gee at gmail.com
Sat Mar 20 21:36:07 CET 2010


On Wed, Mar 17, 2010 at 10:59 AM, Alistair Gee <alistair.gee at gmail.com> wrote:
> On Wed, Mar 17, 2010 at 9:42 AM, Seth Falcon <seth at userprimary.net> wrote:
>>
>> Hmm, I see that you "grow" the vector containing filenames by calling
>> lengthgets and doubling the length.  I don't see where you cleanup
>> before returning -- seems likely you will end up returning a vector that
>> is too long.
>>
>> And there are some performance characteristics to consider in terms of
>> both run time and memory profile.  Does making a single pass through the
>> files make up for the allocations/data copying that result from
>> lengthgets?  Is it worth possibly requiring twice the memory for the
>> worst case?
>>
>> + seth
>>
>>
>>
>>
>>
> Sorry, I left out a call to shorten the vector (via final call to
> lengthgets()). See new patch.
>
> BTW, I modeled this code after code I found in the RODBC package that
> handles rows returned from a database query. I don't know if this is a
> typical approach to reallocation. Maybe there is a better way of
> extending the vector, though wouldn't most of the memory usage be in
> the strings (of the filenames) rather than the STRSXP vector itself?
>
> Anyway, I'm offering this (untested) fix as it handles both
> directories that have grown and directories that have shrunk, so that
> the length of the vector is correct in both cases.
>
> --
>

I fixed my build problems. I also noticed that my patch wasn't
correct, so I have attached a new version.

This fix still grows the vector by doubling it until it is big enough,
but the length is reset to the correct size at the end once it is
known.

This fix differs from the existing fix in subversion in the following scenario:

1.Create file Z in directory with 1 other file named Y
2. Call dir() to retrieve list of files.
3. dir() counts 2 files.
4. While dir() is executing, some other process creates file X in the directory.
5. dir() retrieves the list of files, stopping after 2 files. But by
chance, it retrieves files X and Y (but not Z).
6. dir() returns files X and Y, which could be misinterpreted to mean
that file Z does not exist.

In contrast, with the attached fix, dir() would return all 3 files.

Also, the existing fix in subversion doesn't seem to handle the case
where readdir() returns fewer files than was originally counted as it
doesn't decrease the length of the vector.


More information about the R-devel mailing list