[R] Random Forest: OOB performance = test set performance?

thebudget72 m@iii@g oii gm@ii@com thebudget72 m@iii@g oii gm@ii@com
Mon Apr 12 02:49:53 CEST 2021


Thanks Peter.

Indeed by setting a seed the two results are similar.

I am self-studying and wanted to make sure I understood the concept of 
OOB samples and how much "reliable" were performance metrics calculated 
on them.

It seems I did got it. That's good :)

On 4/11/21 6:34 AM, Peter Langfelder wrote:
> I think the only thing you are doing wrong is not setting the random
> seed (set.seed()) so your results are not reproducible. Depending on
> the random sample used to select the training and test sets, you get
> slightly varying accuracy for both, sometimes one is better and
> sometimes the other.
>
> HTH,
>
> Peter
>
> On Sat, Apr 10, 2021 at 8:49 PM<thebudget72 using gmail.com>  wrote:
>> Hi ML,
>>
>> For random forest, I thought that the out-of-bag performance should be
>> the same (or at least very similar) to the performance calculated on a
>> separated test set.
>>
>> But this does not seem to be the case.
>>
>> In the following code, the accuracy computed on out-of-bag sample is
>> 77.81%, while the one computed on a separated test set is 81%.
>>
>> Can you please check what I am doing wrong?
>>
>> Thanks in advance and best regards.
>>
>> library(randomForest)
>> library(ISLR)
>>
>> Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes")
>> Carseats$High <- as.factor(Carseats$High)
>>
>> train = sample(1:nrow(Carseats), 200)
>>
>> rf = randomForest(High~.-Sales,
>>                     data=Carseats,
>>                     subset=train,
>>                     mtry=6,
>>                     importance=T)
>>
>> acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
>> print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))
>>
>> yhat <- predict(rf, newdata=Carseats[-train,])
>> y <- Carseats[-train,]$High
>> conftest <- table(y, yhat)
>> acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
>> print(paste0("Accuracy test set: ", round(acctest*100,2), "%"))
>>
>> ______________________________________________
>> R-help using r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list