[R] seek non-black box alternative to randomForest

Tue May 30 21:27:42 CEST 2017

Barry, 

This is mostly a mailing list about R - you have have more luck with statistical questions on www.stat.stackexchange.com. 

That said - the editor is wrong. The limitations of trees that random forests “solves” is overfitting. The mechanism by which a random forest classifier is built is not a black box - some number of features and some number of rows are selected to produce a split. The reasons why this approach avoids the issues associated with trees is also clear. These are theory based claims. The random selection is critical to the function of the process. I’d suggest resubmitting the paper to a different journal instead of trying to find some way to fit a random forest without the random part.  

> On May 30, 2017, at 1:54 PM, Barry King <barry.king at qlx.com> wrote:
> 
> I've recently had a research manuscript rejected by an editor. The
> manuscript showed
> that for a real life data set, random forest outperformed multiple linear
> regression
> with respect to predicting the target variable. The editor's objection was
> that
> random forest is a black box where the random assignment of features to
> trees was
> intractable. I need to find an alternative method to random forest that
> does not
> suffer from the black box label. Any suggestions? Would caret::treebag be
> free of
> random assignment of features? Your assistance is appreciated.
> 
> --
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.