[R] R 2.10.0: Error in gsub/calloc

Tue Nov 3 23:59:32 CET 2009

Kenneth,

Thanks for the hint.  I downloaded and installed the latest patch, but  
to no avail.  I can reproduce the error on a single sentence, the  
longest in the document.  It contains 743,393 characters.  It isn't a  
true sentence, but since it is more than three standard deviations  
longer than the mean sentence length, I might be able to use the mean  
and the standard deviation as a way of weeding ot the really evident  
"non-sentences" before I take into account the characteristics of the  
the tokens.

Regards,
Richard

On Nov 3, 2009, at 20:44 , Kenneth Roy Cabrera Torres wrote:

> Try the patch version...
> Maybe is the same problem I had with large
> database when using gsub()
>
> HTH
>
> El mar, 03-11-2009 a las 20:31 +0100, Richard R. Liu escribió:
>> I apologize for not being clear.  d is a character vector of length
>> 158908.  Each element in the vector has been designated by sentDetect
>> (package: openNLP) as a sentence.  Some of these are really
>> sentences.  Others are merely groups of meaningless characters
>> separated by white space.  strapply is a function in the package
>> gosubfn.  It applies to each element of the first argument the  
>> regular
>> expression (second argument).  Every match is then sent to the
>> designated function (third argument, in my case missing, hence the
>> identity function).  Thus, with strapply I am simply performing a
>> white-space tokenization of each sentence.  I am doing this in the
>> hope of being able to distinguish true sentences from false ones on
>> the basis of mean length of token, maximum length of token, or  
>> similar.
>>
>> Richard R. Liu
>> Dittingerstr. 33
>> CH-4053 Basel
>> Switzerland
>>
>> Tel.:  +41 61 331 10 47
>> Email:  richard.liu at pueo-owl.ch
>>
>>
>> On Nov 3, 2009, at 18:30 , Uwe Ligges wrote:
>>
>>>
>>>
>>> richard.liu at pueo-owl.ch wrote:
>>>> I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think
>>>> this
>>>> is a Mac-specific problem.
>>>> I have a very large (158,908 possible sentences, ca. 58 MB) plain
>>>> text
>>>> document d which I am
>>>> trying to tokenize:  t <- strapply(d, "\\w+", perl = T).  I am
>>>> encountering the following error:
>>>
>>>
>>> What is strapply() and what is d?
>>>
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>> Error in base::gsub(pattern, rs, x, ...) :
>>>> Calloc could not allocate (-1398215180 of 1) memory
>>>> This happens regardless of whether I run in 32- or 64-bit mode.   
>>>> The
>>>> machine has 8 GB of RAM, so
>>>> I can hardly believe that RAM is a problem.
>>>> Thanks,
>>>> Richard
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --Apple-Mail-8--203371287--
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.