[R] FW: Random Forest - Strata and sampsize and replace

Lopez, Dan lopez235 at llnl.gov
Tue Nov 18 17:32:32 CET 2014


Hello R Experts,

I want to make sure I understand how the strata, sampsize and replace parameters work so I can confidently perform downsampling on a dataset I'm working with.

My main question is when the documentation talks about how each of these parameters (strata, sampsize, replace) works it is all per tree?  Below is my understanding...can you tell me if I have this correct?


table(iris$Species)



#    setosa versicolor  virginica

#        50         50         50

#default of replace is TRUE


#EACH tree uses a sample of 150. For a given tree since sampling w/ replacement is used it is possible that only one class is represented such as setosa i.e. each setosa observation is represented 3x.

randomForest(Species~.,data=iris)


# EACH tree uses a sample of 30 -- 10 from each class. Observations from each class may be repeated.
randomForest(Species~.,data=iris,sampsize=c(setosa=10,versicolor=10,virginica=10), strata=iris$Species)

# EACH tree uses a sample of 60 -- 10 from the 1st classs, 20 from the 2nd and 30 from the 3rd. Observations from each class may be repeated.
randomForest(Species~.,data=iris,sampsize=c(setosa=10,versicolor=20,virginica=30), strata=iris$Species)

Dan


	[[alternative HTML version deleted]]



More information about the R-help mailing list