[Rd] shash in unique.c

Seth Falcon seth at userprimary.net
Fri Mar 5 19:19:36 CET 2010


On 3/5/10 4:40 AM, Matthew Dowle wrote:
> Thanks a lot.  Quick and brief responses below...
>
> "Duncan Murdoch"<murdoch at stats.uwo.ca>  wrote in message
> news:4B90F134.6070004 at stats.uwo.ca...
>> Matthew Dowle wrote:
>>> I was hoping for a 'yes', 'no', 'maybe' or 'bad idea because ...'. No
>>> response resulted in a retry() after a Sys.sleep(10 days).
>>>
>>> If its a "yes" or "maybe" then I could proceed to try it, test it, and
>>> present the test results and timings to you along with the patch.  It
>>> would be on 32bit Ubuntu first, and I would need to either buy, rent time
>>> on, or borrow a 64bit machine to be able to then test there, owing to the
>>> nature of the suggestion.
>>>
>>> If its "no", "bad idea because..." or "we were already working on it, or
>>> better",  then I won't spend any more time on it.
>>>
>>> Matthew
>>>
>>>
>>> "Matthew Dowle"<mdowle at mdowle.plus.com>  wrote in message
>>> news:hlu4qh$l7s$1 at dough.gmane.org...
>>>
>>>> Looking at shash in unique.c, from R-2.10.1  I'm wondering if it makes
>>>> sense to hash the pointer itself rather than the string it points to?
>>>> In other words could the SEXP pointer be cast to unsigned int and the
>>>> usual scatter be called on that as if it were integer?
>>>>
>> Two negative but probably not fatal issues:
>>
>> Pointers and ints are not always the same size.  In Win64, ints are 32
>> bits, pointers are 64 bits.  (Can we be sure there is some integer type
>> the same size as a pointer?  I don't know, ask a C expert.)
> No we can't be sure. But we could test at runtime, and if the assumption
> wasn't true, then revert to the existing method.

I think the idea is, on the whole, a reasonable one and would be 
inclined to apply a patch if it demonstrated some measurable performance 
improvement.

For the 32bit v 64bit issue, I think we could detect and in the 64bit 
case take something like:

    ((int)p) ^ ((int)(p >> 32))

>>
>> We might want to save the hash to disk.  On restore, the pointer based
>> hash would be all wrong.  (I don't know if we actually do ever save a hash
>> to disk.  )
> The hash table in unique.c appears to be a temporary private hash, different
> to the global R_StringHash. Its private hash appears to be used only while
> the call to unique runs, then free'd. Thats my understanding anyway. The
> suggestion is not to alter the global R_StringHash in any way at all,  which
> is the one that might be saved to disk now or in the future.
>

I agree with your reading: this is a temporary hash table and there 
would be little reason to want to save it (it is not saved now).

+ seth

-- 
Seth Falcon | @sfalcon | http://userprimary.net/user



More information about the R-devel mailing list