[R] randomForest speed improvements

Liaw, Andy andy_liaw at merck.com
Wed Jan 5 16:20:40 CET 2011


Note that that isn't exactly what I recommended.  If you look at the
example in the help page for combine(), you'll see that it is combining
RF objects trained on the same data; i.e., instead of having one RF with
500 trees, you can combine five RFs trained on the same data with 100
trees each into one 500-tree RF.

The way you are using combine() is basically using sample size to limit
tree size, which you can do by playing with the nodesize argument in
randomForest() as I suggested previously.  Either way is fine as long as
you don't see prediction performance degrading.

Andy

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of apresley
> Sent: Tuesday, January 04, 2011 6:30 PM
> To: r-help at r-project.org
> Subject: Re: [R] randomForest speed improvements
> 
> 
> Andy,
> 
> Thanks for the reply.  I had no idea I could combine them 
> back ... that
> actually will work pretty well.  We can have several "worker 
> threads" load
> up the RF's on different machines and/or cores, and then 
> re-assemble them. 
> RMPI might be an option down the road, but would be a bit of 
> overhead for us
> now.
> 
> Using the method of combine() ... I was able to drastically reduce the
> amount of time to build randomForest objects.  IE, using 
> about 25,000 rows
> (6 columns), it takes maybe 5 minutes on my laptop.  Using 5 
> randomForest
> objects (each with 5k rows), and then combining them, takes < 
> 1 minute.
> 
> --
> Anthony
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/randomForest-speed-improvements-
> tp3172523p3174621.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list