[Rd] shash in unique.c
seth at userprimary.net
Fri Mar 5 19:19:36 CET 2010
On 3/5/10 4:40 AM, Matthew Dowle wrote:
> Thanks a lot. Quick and brief responses below...
> "Duncan Murdoch"<murdoch at stats.uwo.ca> wrote in message
> news:4B90F134.6070004 at stats.uwo.ca...
>> Matthew Dowle wrote:
>>> I was hoping for a 'yes', 'no', 'maybe' or 'bad idea because ...'. No
>>> response resulted in a retry() after a Sys.sleep(10 days).
>>> If its a "yes" or "maybe" then I could proceed to try it, test it, and
>>> present the test results and timings to you along with the patch. It
>>> would be on 32bit Ubuntu first, and I would need to either buy, rent time
>>> on, or borrow a 64bit machine to be able to then test there, owing to the
>>> nature of the suggestion.
>>> If its "no", "bad idea because..." or "we were already working on it, or
>>> better", then I won't spend any more time on it.
>>> "Matthew Dowle"<mdowle at mdowle.plus.com> wrote in message
>>> news:hlu4qh$l7s$1 at dough.gmane.org...
>>>> Looking at shash in unique.c, from R-2.10.1 I'm wondering if it makes
>>>> sense to hash the pointer itself rather than the string it points to?
>>>> In other words could the SEXP pointer be cast to unsigned int and the
>>>> usual scatter be called on that as if it were integer?
>> Two negative but probably not fatal issues:
>> Pointers and ints are not always the same size. In Win64, ints are 32
>> bits, pointers are 64 bits. (Can we be sure there is some integer type
>> the same size as a pointer? I don't know, ask a C expert.)
> No we can't be sure. But we could test at runtime, and if the assumption
> wasn't true, then revert to the existing method.
I think the idea is, on the whole, a reasonable one and would be
inclined to apply a patch if it demonstrated some measurable performance
For the 32bit v 64bit issue, I think we could detect and in the 64bit
case take something like:
((int)p) ^ ((int)(p >> 32))
>> We might want to save the hash to disk. On restore, the pointer based
>> hash would be all wrong. (I don't know if we actually do ever save a hash
>> to disk. )
> The hash table in unique.c appears to be a temporary private hash, different
> to the global R_StringHash. Its private hash appears to be used only while
> the call to unique runs, then free'd. Thats my understanding anyway. The
> suggestion is not to alter the global R_StringHash in any way at all, which
> is the one that might be saved to disk now or in the future.
I agree with your reading: this is a temporary hash table and there
would be little reason to want to save it (it is not saved now).
Seth Falcon | @sfalcon | http://userprimary.net/user
More information about the R-devel