[R] Question about randomForest

Sat Nov 26 21:02:33 CET 2011

I've been using the R package randomForest but there is an aspect I
cannot work out the meaning of. After calling the randomForest
function, the returned object contains an element called prediction,
which is the prediction obtained using all the trees (at least that's
my understanding). I've checked that this prediction set has the error
rate as reported by err.rate.

However, if I send the training data back into the the
predict.randomForest function I find I get a different result to the
stored set of predictions. This is true for both classification and
regression. I find the predictions obtained this way also have a much
lower error rate and perform very well (suspiciously well...) on
measures such as AUC.

My understanding is that the two predictions above should be the same.
Since they are not, I must be not understanding something properly.
Any ideas what's going on?