[R] Variable Importance - Random Forest

Henric Nilsson (Public) nilsson.henric at gmail.com
Sun Aug 26 00:32:37 CEST 2007


Den 2007-08-24 21:13, Mathe, Ewy (NIH/NCI) [F] skrev:
> Hello,
> 
>  
> 
> I am trying to explore the use of random forests for classification and
> am certain about the interpretation of the importance measurements.

In case you haven't already done so, you probably want to read

@ARTICLE{Strobl+Boulesteix+Zeileis+Hothorn:2007,
   author = {Carolin Strobl and Anne-Laure Boulesteix and Achim Zeileis 
   and Torsten Hothorn},
   title = {Bias in Random Forest Variable Importance Measures: 
Illustrations,
   		   Sources and a Solution},
   journal = {{BMC} Bioinformatics},
   year = {2007},
   volume = {8},
   number = {25},
   url = {http://www.biomedcentral.com/1471-2105/8/25/}
}


HTH,
Henric



> 
>  
> 
> When having the option "importance = T" in the randomForest call, the
> resulting 'importance' element matrix has four columns with the
> following headings:
> 
> 0 - mean raw importance score of variable x for class 0 (where
> importance is the difference between the permutated data error and the
> original test set error)
> 
> 1 - mean raw importance score of variable x for class 1
> 
> MeanDecreaseAccuracy : average lowering of the margin across all cases
> (where margin is the proportion of votes for the true class - the
> maximum proportion of votes for the other classes)
> 
> MeanDecreaseGini : summation of the gini decreases over all trees in the
> forest
> 
>  
> 
> Are these definitions correct?  Why is the raw importance score
> calculated for each class?  Could one just average the raw importance
> scores for class 0 and 1 to get a composite importance score?
> 
>  
> 
> Now, when having the option "importance = F" in the randomForest call,
> the 'importance' element is now a vector.  What values are those?
> 
>  
> 
> Thank you in advance for any input you may have.
> 
>  
> 
> Best,
> 
> Ewy
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Ewy Mathe, Ph. D.
> 
> Laboratory of Human Carcinogenesis
> 
> National Cancer Institute, NIH
> 
> 37 Convent Drive
> 
> Building 37, Room 3068
> 
> Bethesda, MD  20892-4255
> 
> Tel: 301-496-5835
> 
> Fax: 301-496-0497
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list