[Rd] shash in unique.c

Matthew Dowle mdowle at mdowle.plus.com
Mon Feb 22 15:37:04 CET 2010


Looking at shash in unique.c, from R-2.10.1  I'm wondering if it makes sense 
to hash the pointer itself rather than the string it points to?
In other words could the SEXP pointer be cast to unsigned int and the usual 
scatter be called on that as if it were integer?

shash would look like a slightly modified version of ihash like this :

static int shash(SEXP x, int indx, HashData *d)
{
    if (STRING_ELT(x,indx) == NA_STRING) return 0;
    return scatter((unsigned int) (STRING_ELT(x,indx), d);
}

rather than its current form which appears to hash the string it points to :

static int shash(SEXP x, int indx, HashData *d)
{
    unsigned int k;
    const char *p;
    if(d->useUTF8)
 p = translateCharUTF8(STRING_ELT(x, indx));
    else
 p = translateChar(STRING_ELT(x, indx));
    k = 0;
    while (*p++)
     k = 11 * k + *p; /* was 8 but 11 isn't a power of 2 */
    return scatter(k, d);
}

Looking at sequal, below, and reading its comments, if the pointers are 
equal it doesn't look at the strings they point to, which lead to the 
question above.

static int sequal(SEXP x, int i, SEXP y, int j)
{
    if (i < 0 || j < 0) return 0;
    /* Two strings which have the same address must be the same,
       so avoid looking at the contents */
    if (STRING_ELT(x, i) == STRING_ELT(y, j)) return 1;
    /* Then if either is NA the other cannot be */
    /* Once all CHARSXPs are cached, Seql will handle this */
    if (STRING_ELT(x, i) == NA_STRING || STRING_ELT(y, j) == NA_STRING)
 return 0;
    return Seql(STRING_ELT(x, i), STRING_ELT(y, j));
}

Matthew



More information about the R-devel mailing list