[R] ecological meaning of randomForest vegetation classification? [Broadcast]

Liaw, Andy andy_liaw at merck.com
Wed Sep 5 19:49:09 CEST 2007


Hi Christoph,

I'm not exactly sure what you're looking for, but I'll take a stab
anyway.

The trees in a random forest is not designed to be interpreted as one
would
with an "ordinary" tree.  There are several things you may try to see if
they help you any.  One is the distribution of votes.  It looks like you
are
classifying each data point into one of many possible classes.  RF with
give
you the fraction of trees in the forest that classified the observation
as
a particular class (and the class with the highest fraction of votes is
the
"predicted class").  Another is the partial dependence plot:  You can
use
plot(importance(rf.object)) to see which variables are the most
important,
and then use partialPlot() to examine their marginal effects.  These
offer
some clue of what the RF black box is doing, and hopefully will make
some
sense to you.

Best,
Andy 

From: Christoph Muller
> 
> Hi, everyone,
> 
> I haven't found anything similar in the forum, so here's my 
> problem (I'm no
> expert in R nor statistics):
> 
> I have a data set of 59.000 cases with 9 variables each (fractional
> coverage of 9 different plant types, such as deciduous broad-leaved
> temperate trees or evergreen tropical trees etc.), which was 
> generated by a
> vegetation model.
> In order to evaluate the quality of the vegetation model's 
> output, I want
> to compare it to a land-cover data set which has 23 different 
> land-cover
> types (such as needle leaved evergreen forest, dense 
> broad-leaved forest,
> barren, etc.).
> A statistician advised me to use the randomForest package in 
> R and using a
> sub-set to generate the random Forest, I get a very good 
> prediction for the
> rest.
> However, I need to evaluate how meaningful this 
> classification is in an
> ecological sense (boreal trees should not play a role in the 
> definition of
> tropical land-cover types, for example), otherwise I cannot judge the
> quality of the vegetation model's output.
> 
> Unfortunately, randomForest gives me about 15.000 splits of 
> which about
> 5000 are end branches (rough guess), so it's very hard and 
> time-consuming
> to check each single branch of one of the final trees for its 
> ecological
> meaning.
> Is there any utility to summarize the characteristics of each 
> of the 23
> prediction classes? Such as "land-cover class 1 has less than 
> 5% of plant
> types 1-5, 20-50% of plant type 7 and at least 30% of plant type 8".
> Or is there a more suitable method to classify my data?
> 
> Thanks a lot in advance!
> 
> Christoph
> ______________________________________________________________
> ______________
> 
> Click on the following link for the Netherlands Environmental 
> Assessment
> Agency(MNP)mission and contact information:
> http://www.mnp.nl/signature.html
> 
> Klik op de volgende link voor missie en contactinformatie van het
> Milieu- en Natuurplanbureau (MNP): http://www.mnp.nl/signature.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}



More information about the R-help mailing list