[R] randomForest - what is a 'good' pseudo r-squared?

Liaw, Andy andy_liaw at merck.com
Tue Jul 21 16:55:19 CEST 2009


Generally speaking, the pseudo R^2 of 70% is a rather good model
(obviously depends on the kind of data you have at hand).  Because it's
"pseudo", not "real", R^2, so the range is not limited to [0, 100%], but
it's hard for me to imagine anyone getting >100%.

You may want to check the distribution of the response (or residuals) to
see if a transformation is appropriate.  Tree-based methods (of which
random forests is one) can be sensitive to heteroscedasticity.

Best,
Andy 

From: lara harrup (IAH-P)
> 
> Hi all
> 
> I have been trying to use the randomForest package to model 
> insect species abundance in different habitats and identify 
> the key variables (landscape/climate etc) in determining 
> abundance, which has all worked fine and I get nice variable 
> importance plots etc. Many thanks to everyone on this help 
> forum who has given tips/advice along the way.
> 
> But the percentage variance explained /pseudo r squared 
> reported when I call print(model) is quite low, depending on 
> the species being modelled it ranges from a maximum of 23.69 
> right down to -2.08.
> 
> I believe that the minus value represents a model that 
> performs no better / worse than random and obviously the 
> larger the R^2 gets the better the predictive ability but 
> over what range does this r^2 operate?
> 
> As it is not unexpected that some of these models would have 
> poor predictive accuracy as part of the larger project around 
> this work is to say finer resolution remotely sensed 
> satellite imagery is needed to derive the climate variables 
> etc being used to predict species abundance.
> 
> My question is probably a bit like how long is a piece of 
> string but if anyone could offer some guidance on what 
> constitutes a good / very good / bad / very bad r-squared 
> value for random forest it would be most appreciated and if 
> there are any other accuracy measure that can be used with 
> Random Forest in addition to the pseudo r^2 value? as this 
> work will be presented to an entomology/ecology audience 
> where machine learning is a bit outside their (and my) 
> statistics comfort zone.
> 
> Many thanks in advance
> 
> Lara
> 
> lara.harrup at bbsrc.ac.uk
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}




More information about the R-help mailing list