[R] Random Forests: Predictor importance for Regression Trees

Mon Apr 20 20:35:50 CEST 2009

Hello!

I think I am relatively clear on how predictor importance (the first
one) is calculated by Random Forests for a Classification tree:

Importance of predictor P1 when the response variable is categorical:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, subtract the number of votes for the correct
class in the predictor-P1-permuted oob dataset from the number of
votes for the correct class in the untouched oob dataset: if P1 is
important, this number will be large.
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

I am wondering what step 2 above looks like if the response variable
is continous and not categorical, in other words - for a Regression
tree. Could you please correct if what I wrote below is wrong? Thank
you very much!

Importance of predictor P1 when the response variable is continous:

1. For out-of-bag (oob) cases, randomly permute their values on
predictor P1 and then put them down the tree
2. For a given tree, calculate mean squared deviation of observed y
minus predicted y for (a) the untouched oob dataset and for (b) the
predictor-P1-permuted oob dataset. Subtract (a) from (b).
3. The average of this number over all trees in the forest is the raw
importance score for predictor P1.

-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com