[R] unique possible bug

Uwe Ligges ligges at statistik.tu-dortmund.de
Thu Oct 6 16:59:05 CEST 2011


I see.  For now: Yes, you need to change and recompile.
I will take a look what was actually changed and will run some test cases.

Best,
Uwe


On 06.10.2011 16:50, Patrick McCann wrote:
> The error I am referring to is in unique.c in Base R, it cannot
> accomodate greater than 2^29 values, even though it appears the
> overflow protection should be 2^30. The only relevance of the arules
> package is I was using it while I discovered this issue.
>
> Thanks,
> Patrick
>
>
>
> 2011/10/6 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
>>
>>
>> On 05.10.2011 22:15, Patrick McCann wrote:
>>>
>>> Hi,
>>>
>>> I am trying to read in a rather large list of transactions using the
>>> arules library.
>>
>> You mean the arules package?
>>
>>
>>> It seems in the coerce method into the dgCmatrix, it
>>> somewhere calls unique. Unique.c throws an error when  n>    536870912;
>>> however, when 4*n was modified to 2*n in 2004, the overflow protection
>>> should have changed from 2^29 to 2^30, right? If so, how would I
>>> change it in my copy? Do I have to recompile everything?
>>
>> Yes.
>>
>> There is also the way to ask the maintainer to improve it, but it won't work
>> without reinstallation of the changed package sources.
>>
>> Uwe Ligges
>>
>>
>>>
>>> Thanks,
>>> Patrick McCann
>>>
>>>
>>> Here is a simple to reproduce example:
>>>>
>>>> runif(2^29+5)->a
>>>> sum(unique(a))->b
>>>
>>> Error in unique.default(a) : length 536870917 is too large for hashing
>>>>
>>>> traceback()
>>>
>>> 3: unique.default(a)
>>> 2: unique(a)
>>> 1: unique(a)
>>>>
>>>> unique.default
>>>
>>> function (x, incomparables = FALSE, fromLast = FALSE, ...)
>>> {
>>>      z<- .Internal(unique(x, incomparables, fromLast))
>>>      if (is.factor(x))
>>>          factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
>>>              ordered = is.ordered(x))
>>>      else if (inherits(x, "POSIXct"))
>>>          structure(z, class = class(x), tzone = attr(x, "tzone"))
>>>      else if (inherits(x, "Date"))
>>>          structure(z, class = class(x))
>>>      else z
>>> }
>>> <environment: namespace:base>
>>>
>>>>  From http://svn.r-project.org/R/trunk/src/main/unique.c I see:
>>>
>>>
>>> /*
>>>   Choose M to be the smallest power of 2
>>>   not less than 2*n and set K = log2(M).
>>>   Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29.
>>>
>>>   Dec 2004: modified from 4*n to 2*n, since in the worst case we have
>>>   a 50% full table, and that is still rather efficient -- see
>>>   R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
>>> */
>>> static void MKsetup(int n, HashData *d)
>>> {
>>>     int n4 = 2 * n;
>>>
>>>     if(n<    0 || n>    536870912) /* protect against overflow to -ve */
>>>         error(_("length %d is too large for hashing"), n);
>>>     d->M = 2;
>>>     d->K = 1;
>>>     while (d->M<    n4) {
>>>         d->M *= 2;
>>>         d->K += 1;
>>>     }
>>> }
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>



More information about the R-help mailing list