[R] unique possible bug

Uwe Ligges ligges at statistik.tu-dortmund.de
Thu Oct 6 16:00:19 CEST 2011



On 05.10.2011 22:15, Patrick McCann wrote:
> Hi,
>
> I am trying to read in a rather large list of transactions using the
> arules library.

You mean the arules package?


> It seems in the coerce method into the dgCmatrix, it
> somewhere calls unique. Unique.c throws an error when  n>  536870912;
> however, when 4*n was modified to 2*n in 2004, the overflow protection
> should have changed from 2^29 to 2^30, right? If so, how would I
> change it in my copy? Do I have to recompile everything?

Yes.

There is also the way to ask the maintainer to improve it, but it won't 
work without reinstallation of the changed package sources.

Uwe Ligges


>
> Thanks,
> Patrick McCann
>
>
> Here is a simple to reproduce example:
>> runif(2^29+5)->a
>> sum(unique(a))->b
> Error in unique.default(a) : length 536870917 is too large for hashing
>> traceback()
> 3: unique.default(a)
> 2: unique(a)
> 1: unique(a)
>> unique.default
> function (x, incomparables = FALSE, fromLast = FALSE, ...)
> {
>      z<- .Internal(unique(x, incomparables, fromLast))
>      if (is.factor(x))
>          factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
>              ordered = is.ordered(x))
>      else if (inherits(x, "POSIXct"))
>          structure(z, class = class(x), tzone = attr(x, "tzone"))
>      else if (inherits(x, "Date"))
>          structure(z, class = class(x))
>      else z
> }
> <environment: namespace:base>
>
>> From http://svn.r-project.org/R/trunk/src/main/unique.c I see:
>
>
> /*
>   Choose M to be the smallest power of 2
>   not less than 2*n and set K = log2(M).
>   Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29.
>
>   Dec 2004: modified from 4*n to 2*n, since in the worst case we have
>   a 50% full table, and that is still rather efficient -- see
>   R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
> */
> static void MKsetup(int n, HashData *d)
> {
>     int n4 = 2 * n;
>
>     if(n<  0 || n>  536870912) /* protect against overflow to -ve */
>         error(_("length %d is too large for hashing"), n);
>     d->M = 2;
>     d->K = 1;
>     while (d->M<  n4) {
>         d->M *= 2;
>         d->K += 1;
>     }
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list