[R] Memory problems with large dataset in rpart

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Oct 18 08:50:39 CEST 2005


Looks like you have missed the section in the rw-FAQ entitled

     2.9 There seems to be a limit on the memory it uses!

and not set --max-mem-size (which defaults to 1Gb on your system).

However, it looks like your problem is memory fragmentation, and trying to 
run 1Gb tasks in a 2Gb address space is intrinsically a problem to which 
the only solution is a 64-bit version of R.

BTW, /3GB is nothing whatsoever to do with `swap files': if both OS and 
application are configured correctly it increases the user address space 
*for that process* to /3GB (whereas swap space is shared between 
processes).

On Tue, 18 Oct 2005 vheijst at few.eur.nl wrote:

> Dear helpers,
>
> I am a Dutch student from the Erasmus University. For my Bachelor thesis I
> have written a script in R using boosting by means of classification and
> regression trees. This script uses the function the predefined function
> rpart. My input file consists of about 4000 vectors each having 2210
> dimensions. In the third iteration R complains of a lack of memory,
> although in each iteration every variable is removed from the memory. Thus
> the first two iterations run without any problems.
>
> My computer runs on Windows XP and has 1 gigabye of internal memory.
> I tried R using more memory by refiguring the swap files as memtioned in
> the FAQ (/3gb), but I didn't succeed in making this work.
> The command round(memory.limit()/1048576.0, 2) gives 1023.48
>
> If such an increase of memory can not succeed, perhaps the size of the
> rpart object could be reduced by not storing unnecessary information.
> The rpart function call is (the calls of FALSE is to try to reduce the
> size of the fit object):
> fit <- rpart(price ~ ., data = trainingset,
> control=rpart.control(maxdepth=2,cp=0.001),model=FALSE,x=FALSE,y=FALSE)
>
> This fit object is later called in 2 predict functions, for example:
> predict(fit,newdata=sample)
>
> Can anybody please help me by letting R use more memory (for example swap)
> or can anybody help me reducing the size of the fit object?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list