[R] randomForest out of bag prediction

Peter Langfelder peter@l@ngfelder @ending from gm@il@com
Sat Jan 12 19:56:33 CET 2019


See inline.

On Sat, Jan 12, 2019 at 9:56 AM Witold E Wolski <wewolski using gmail.com> wrote:

> ypred_oob <- predict(diachp.rf)

AFAIK these are, indeed, the out-of-bag predictions.

> dataX <- data %>% select(-quality) # remove response.
> ypred <- predict( diachp.rf, dataX )

These are not out of bag predictions. dataX is interpreted as new data
(argument newdata), and it is assumed to contain entirely new
observations. Each observation in dataX is fed through all of the
trees and the predictions are then pooled. There is no out-of-bag here
- all of the new data observations are assumed to be independent of
the training set.

>
> What I find even more disturbing is that 100% accuracy for ypred.
> Would you agree that this is rather unexpected?

It is expected (and not disturbing) l if your training set had enough
variables (or signal) to create trees that fit the training data
perfectly.

HTH,

Peter



More information about the R-help mailing list