[R] RandomForest & memory demand

Liaw, Andy andy_liaw at merck.com
Tue Nov 25 14:56:28 CET 2003


> From: Christian Schulz
> 
> Hi,
> 
> is it correct that i need  ~ 2GB RAM that it's
> possible to work with the default setting 
> ntree=500 and a data.frame with 100.000 rows 
> and max. 10 columns for training and testing?

If you have the test set, and don't need the forest for predicting other
data, you can give both training data and test data to randomForest() at the
same time (if that fits in memory).  This way there will only be one tree
kept in memory.  E.g., you would do something like:

my.result <- randomForest(x, y, xtest)

Then my.result$test will contain a list of results on the test set.  If you
also give ytest, there will be a bit more output.

If you follow Torsten's suggestion, you can use the combine() function to
merge the five forests into one.
 
> P.S.
> It's possible calculate approximate the
> memory demand for different settings with RF?

The current implementation of the code requires (assuming classification, no
test data, and proximity=FALSE) approximately:

At R level:
- One copy of the training data.
- 6*(2n+1)*ntree integers for storing the forest.

At C level (dynamically allocated):
- (2n + 37)*nclass + 9*n + p*(2+nclass) doubles.
- 5 + (3*p + 22)*n + 5*(p + nclass) integers.

(nclass is the number of classes, n the number of cases in training data, p
the number of variables.)

HTH,
Andy
 
> Many thanks & regards,
> Christian




More information about the R-help mailing list