[R] rpart and randomforest results

Mon Apr 7 17:23:36 CEST 2014

Hi Sonja,

How did you build the rpart tree (i.e., what settings did you use in rpart.control)?  Rpart by default will use cross validation to prune back the tree, whereas RF doesn't need that.  There are other more subtle differences as well.  If you want to compare single tree results, you really want to make sure the settings in the two are as close as possible.  Also, how did you compute the pseudo R2, on test set, or some other way?

Best,
Andy

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Schillo, Sonja
Sent: Thursday, April 03, 2014 3:58 PM
To: Mitchell Maltenfort
Cc: r-help at r-project.org
Subject: Re: [R] rpart and randomforest results

Hi,

the random forest should do that, you're totally right. As far as I know it does so by randomly selecting the variables considered for a split (but here we set the option for how many variables to consider at each split to the number of variables available so that I thought that the random forest does not have the chance to randomly select the variables). The next thing that randomforest does is bootstrapping. But here again we set the option to the number of cases we have in the data set so that no bootstrapping should be done.
We tried to take all the "randomness" from the randomforest away.

Is that plausible and does anyone have another idea?

Thanks
Sonja

Von: Mitchell Maltenfort [mailto:mmalten at gmail.com]
Gesendet: Dienstag, 1. April 2014 13:32
An: Schillo, Sonja
Cc: r-help at r-project.org
Betreff: Re: [R] rpart and randomforest results

Is it possible that the random forest is somehow adjusting for optimism or overfitting?
On Apr 1, 2014 7:27 AM, "Schillo, Sonja" <Sonja.Schillo at uni-due.de<mailto:Sonja.Schillo at uni-due.de>> wrote:
Hi all,

I have a question on rpart and randomforest results:

We calculated a single regression tree using rpart and got a pseudo-r2 of roundabout 10% (which is not too bad compared to a linear regression on this data). Encouraged by this we grew a whole regression forest on the same data set using randomforest. But we got  pretty bad pseudo-r2 values for the randomforest (even sometimes negative values for some option settings).
We then thought that if we built only one single tree with the randomforest routine we should get a result similar to that of rpart. So we set the options for randomforest to only one single tree but the resulting pseudo-r2 value was negative aswell.

Does anyone have a clue as to why the randomforest results are so bad whereas the rpart result is quite ok?
Is our assumption that a single tree grown by randomforest should give similar results as a tree grown by rpart wrong?
What am I missing here?

Thanks a lot for your help!
Sonja

______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}