[R] How do I make R randomForest model size smaller?

Liaw, Andy andy_liaw at merck.com
Tue Dec 4 15:39:29 CET 2012


Try the following:

set.seed(100)
rf1 <- randomForest(Species ~ ., data=iris)
set.seed(100)
rf2 <- randomForest(iris[1:4], iris$Species)
object.size(rf1)
object.size(rf2)
str(rf1)
str(rf2)

You can try it on your own data.  That should give you some hints about why the formula interface should be avoided with large datasets.

Andy

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of John Foreman
Sent: Monday, December 03, 2012 3:43 PM
To: r-help at r-project.org
Subject: [R] How do I make R randomForest model size smaller?

I've been training randomForest models on 7 million rows of data (41
features). Here's an example call:

myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30)

I thought surely with only 50 trees and 30 terminal nodes that the memory
footprint of "myModel" would be small. But it's 65 megs in a dump file. The
object seems to be holding all sorts of predicted, actual, and vote data
from the training process.

What if I just want the forest and that's it? I want a tiny dump file that
I can load later to make predictions off of quickly. I feel like the forest
by itself shouldn't be all that large...

Anyone know how to strip this sucker down to just something I can make
predictions off of going forward?

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}




More information about the R-help mailing list