[R] memory problems when combining randomForests [Broadcast]

Eleni Rapsomaniki e.rapsomaniki at mail.cryst.bbk.ac.uk
Thu Jul 27 17:07:55 CEST 2006


I'm using R (windows) version 2.1.1, randomForest version 4.15. 
I call randomForest like this:

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
 xtest=test.df[,-response_index], ytest=test.df[,response_index],
 importance=TRUE,proximity=FALSE, keep.forest=TRUE)

 (where train.df and test.df are my train and test data.frames and
 response_index is the column number specifiying the class)

I then save each tree to a file so I can combine them all afterwards. There are
no memory issues when keep.forest=FALSE. But I think that's the bit I need for
future predictions (right?). 

I did check previous messages on memory issues, and thought that
combining the trees afterwards would solve the problem. Since my
cross-validation subsets give me a fairly stable error-rate, I suppose I could
just use a randomForest trained on just a subset of my data. But would I not be
"wasting" data this way?

A bit off the subject, but should the order at which at rows (ie. sets of
explanatory variables) are passed to the randomForest function affect the
result? I have noticed that if I pick a random unordered sample from my control
data for training the error rate is much lower than if I a take an ordered
sample. This remains true for all my cross-validation results. 

I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki



More information about the R-help mailing list