[R] Memory problems with large dataset in rpart

vheijst@few.eur.nl vheijst at few.eur.nl
Tue Oct 18 06:54:10 CEST 2005


Dear helpers,

I am a Dutch student from the Erasmus University. For my Bachelor thesis I
have written a script in R using boosting by means of classification and
regression trees. This script uses the function the predefined function
rpart. My input file consists of about 4000 vectors each having 2210
dimensions. In the third iteration R complains of a lack of memory,
although in each iteration every variable is removed from the memory. Thus
the first two iterations run without any problems.

My computer runs on Windows XP and has 1 gigabye of internal memory.
I tried R using more memory by refiguring the swap files as memtioned in
the FAQ (/3gb), but I didn't succeed in making this work.
The command round(memory.limit()/1048576.0, 2) gives 1023.48

If such an increase of memory can not succeed, perhaps the size of the
rpart object could be reduced by not storing unnecessary information.
The rpart function call is (the calls of FALSE is to try to reduce the
size of the fit object):
fit <- rpart(price ~ ., data = trainingset,
control=rpart.control(maxdepth=2,cp=0.001),model=FALSE,x=FALSE,y=FALSE)

This fit object is later called in 2 predict functions, for example:
predict(fit,newdata=sample)

Can anybody please help me by letting R use more memory (for example swap)
or can anybody help me reducing the size of the fit object?

Kind regards
Dennis van Heijst
Student Informatics & Economics
Erasmus University Rotterdam
The Netherlands




More information about the R-help mailing list