[R] randomForest question--problem with ntree

Michael Knudsen micknudsen at gmail.com
Fri Aug 14 12:03:40 CEST 2009


On Thu, Aug 13, 2009 at 11:11 PM, Mary Putt<mputt at mail.med.upenn.edu> wrote:

Hi Mary,

> I would like to use a random Forest model to get an idea about which variables from a dataset may have some prognostic significance in a smallish study. The default for the number of trees seems to be 500. I tried changing the default to ntree=2000 or ntree=200 and the results appear identical. Have changed mtry from mtry=5 to mtry=6 successfully. Have seen same problem on both a Windows machine and our linux system running 2.8 and 2.9.

I don't think it's correct to call it a problem; it's more likely a
feature! Try to take a look a Breiman's paper (in the "Machine
Learning" journal), where he introduces random forests. I read it
recently, and somewhere he explicitly mentions that ntree often may be
set very low without lowering the performance.

The random forest algorithm is very robust and apparently 500 trees
are usually more than enough. Therefore you don't get better results
by using 2000 trees, and often it doesn't affect the performance if
you use fewer trees (e.g. 200).

Best,
Michael

-- 
Michael Knudsen
micknudsen at gmail.com
http://lifeofknudsen.blogspot.com/




More information about the R-help mailing list