[R] Random Forest - Strata

Tim Howard tghoward at gw.dec.state.ny.us
Wed Jul 21 13:11:24 CEST 2010


Coll,

An alternative approach is to do that subsetting yourself before sending it to RF and treat each group as an external validation group, as follows:
- extract Site A, build a RF model (Model 1) on sites B and C
- validate this model by running a predict on site A using the model, use ROCR or other evaluation metrics to look at the effectiveness of Model 1. 
- extract Site B, build a RF model (Model 2) on sites A and C.
- validate this model by trying to predict presence in site B using model 2.
- continue through all your sites.

This is called 'leave-one-out' and is used in some fields for model validation.  You final accuracy estimates of your model could be based on the averages of values obtained for each model. 

Hope that Helps. 
Tim



------------------------------

Message: 44
Date: Tue, 20 Jul 2010 08:48:04 -0700 (PDT)
From: Coll <gbcoll2 at gmail.com>
To: r-help at r-project.org 
Subject: [R] Random Forest - Strata
Message-ID: <1279640884553-2295731.post at n4.nabble.com>
Content-Type: text/plain; charset=us-ascii


Hi all,

Had struggled in getting "Strata" in randomForest to work on this. 

Can I get randomForest for each of its TREE, to get ALL sample from some
strata to build tree, while leaving some strata TOTALLY untouched as oob?

e.g. in below, how I can tell RF to, 
- for tree 1 in the forest, to use only Site A and B to build the tree,
while using the WHOLE Site C data for the oob error rate,
- for tree 2, use only site A and C to build tree, while using whole site B
data for oob
- for tree 3, use Site B and C, A as oob...?

My command does not work as it would use some sample in all of the sites:
rforest.obj <- randomForest(Presence.f ~., data=dataset.subset, strata =
site.factor)

while 
the setting the corresponding "sampsize" argument seems would only screen
out the Site in all tree building...

Site	Presence	  Length	  Sulphur
A	        Yes	       3.50	        19.42
A	        No	        3.90	        51.09
A	        No	        3.60	        26.75
B	        Yes	       2.60	        9.71
B	        No	        2.20	        9.77
B	        No	        2.60	        8.60
B	        No	        3.00	        35.59
C	        Yes	       3.50	        16.07
C	        No	        3.40	        49.96
C	        No	        3.10	        35.35

Any idea / comments are welcomed.

Thanks in advance.

Coll
-- 
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Strata-tp2295731p2295731.html 
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list