[R] "too large for hashing"

Adam D. I. Kramer adik-rhelp at ilovebacon.org
Thu Apr 5 21:07:15 CEST 2012


Thanks for your response, Duncan.

x$eventtype is a "character" vector (because the same hashing error
occurred when I tried to read.table() in the first place specifying
colClasses = c(..., "factor", ...).

x really is that long:

> dim(x)
[1] 1093574297         12

...the x$eventtype field has three unique values.

(I'm currently using a workaround of making a numeric column based on a
string of ifelse() and then setting class() <- factor and then setting the
labels manually.)

--Adam

On Thu, 5 Apr 2012, Duncan Murdoch wrote:

> On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote:
>> Hello,
>>
>>   	I'm doing some analysis on a rather large data set. In this case,
>> some simple commands are failing. For example, this one:
>> 
>> >  x$eventtype<- factor(x$eventtype)
>> Error in unique.default(x) : length 1093574297 is too large for hashing
>> 
>> ...I think this is a bug, because "hashing" should not be required for the
>> "factor" function. Am I right? The whole column does not need to be hashed,
>> only the unique keys. Sure, there is the potential to overflow the key
>> register, but this error should be thrown only if that occurs, no?
>
> It looks as though the error is coming when unique() tries to determine the 
> unique levels in the argument, but really there's no way to answer your 
> question without more information.  What type of object is x$eventtype?  It 
> is really 1093574297 elements long?  How many unique values does it have?
>
> Duncan Murdoch
>



More information about the R-help mailing list