[R] Random Forests: Predictor importance for Regression Trees

Dimitri Liakhovitski ld7631 at gmail.com
Mon Apr 20 20:35:50 CEST 2009


I think I am relatively clear on how predictor importance (the first
one) is calculated by Random Forests for a Classification tree:

Importance of predictor P1 when the response variable is categorical:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, subtract the number of votes for the correct
class in the predictor-P1-permuted oob dataset from the number of
votes for the correct class in the untouched oob dataset: if P1 is
important, this number will be large.
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

I am wondering what step 2 above looks like if the response variable
is continous and not categorical, in other words - for a Regression
tree. Could you please correct if what I wrote below is wrong? Thank
you very much!

Importance of predictor P1 when the response variable is continous:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, calculate mean squared deviation of observed y
minus predicted y for (a) the untouched oob dataset and for (b) the
predictor-P1-permuted oob dataset. Subtract (a) from (b).
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com

More information about the R-help mailing list