[Rd] Hashed environments of size <5 never grow

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Apr 14 08:59:30 CEST 2022


>>>>> Duncan Garmonsway 
>>>>>     on Mon, 11 Apr 2022 22:24:52 +0100 writes:

    > Hello,

    > Hashed environments that begin with a (non-default) size
    > of 4 or less, will never grow, which is very detrimental
    > to performance.  For example,

    > ```
    > n <- 10000
    > l <- vector("list", n)
    > l <- setNames(l, seq_len(n))

    > # Takes a second, and nchains remains 1.
    > e1 <- list2env(l, hash = TRUE, size = 1)
    > env.profile(e1)$nchains
    > # [1] 1

    > # Returns instantly, and nchains grows to 6950
    > e2 <- list2env(l, hash = TRUE, size = 5)
    > env.profile(e2)$nchains
    > # [1] 6950
    > ```

    > The cause is that, when calling the growth function, the new size is
    > truncated to an integer.  See src/main/envir.c line 440, or
    > https://github.com/wch/r-source/blob/d9b9d00b6d2764839f229bf011dda8d027aae227/src/main/envir.c#L440

    > Given the hard-coded growth rate of 1.2, any size of 4 or less will be
    > truncated back to itself.

    > (int) (1 * 1.2 ) = 1
    > (int) (2 * 1.2) = 1
    > (int) (3 * 1.2) = 1
    > (int) (4 * 1.2) = 1
    > (int) (5 * 1.2) = 6

Yes.  I'm convinced this has been oversight and should be
corrected.

    > This is a rare case, and I couldn't find any examples in CRAN packages of
    > the `size` argument being used at all, let alone so small.  Even so, it
    > tripped me up, and could be fixed by using `ceil()` in src/main/envir.c
    > line 440 as follows.

    > new_table = R_NewHashTable((int)(ceil(HASHSIZE(table) *
    > HASHTABLEGROWTHRATE)))

Indeed, this bug would surface very very rarely,
but I agree a fix such as your proposition  is appropriate.

I'll do so, also adding a regression test.

Martin Maechler
ETH Zurich   and  R Core team


    > Kind regards,
    > Duncan Garmonsway



More information about the R-devel mailing list