[R] Random Forest & Cross Validation

ronzhao yzhaohsph at gmail.com
Sun Feb 20 01:02:23 CET 2011


Hi,
I am using randomForest package to do some prediction job on GWAS data. I
firstly split the data into training and testing set (70% vs 30%), then
using training set to grow the trees (ntree=100000). It looks that the OOB
error in training set is good (<10%). However, it is not very good for the
test set with a AUC only about 50%. 
Although some people said no cross-validation was necessary for RF, I still
felt unsafe and thought a testing set is important. I felt really frustrated
with the results.


Anyone has some suggestions?

Thanks.

PS: example code I used

RF<-randomForest(PHENOTYPE~.,data=Train,importance=T,ntree=20000,do.trace=5000)
rownames(Test)<-Test$IID

Pred<-predict(RF,Test,type="prob")


-- 
View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-tp3314777p3314777.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list