[R] randomForest: predictor importance (for regressions)
Dimitri Liakhovitski
ld7631 at gmail.com
Wed May 5 19:51:18 CEST 2010
I have a question about predictor importances in randomForest.
Once I've run randomForest and got my object, I get their importances:
rfresult$importance
I also get the "standard errors" of the permutation-based importance
measure: rfresult$importanceSD
I have 2 questions:
1. Because I am dealing with regressions, I am getting an importance object
(rfresult$importance) with two columns, labeled "%IncMSE" (the first column)
and "IncNodePurity" (the second column). I assume it's the first one that is
the mean decrease in accuracy due to permutation. Am I correct or am I
wrong? I am confused because ?randomForest says: "or Regression, the first
column is the mean decrease in accuracy and the second the mean decrease in
MSE." - but it is the first column, not the second that has "MSE" in its
header.
2. According to this thread (
http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg94873.html), The
overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is
sd(d[i])/sqrt(ntree) (the "standard error").
So, in order to get at the importance of predictors (and I want to use the
permutation-based importance) - should I just take the first column of
rfresult$importance or should I first divide rfresult$importance by
rfresult$importanceSD - to get something analogous to z-scores and use
those?
Thank you very much!
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list