[R] Can't seem to finish a randomForest.... Just goes and goes!
Torsten.Hothorn at rzmail.uni-erlangen.de
Mon Apr 5 08:27:17 CEST 2004
On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote:
> Playing with randomForest, samples run fine. But on real data, no go.
> Here's the setup: OS X, same behavior whether I'm using R-Aqua 1.8.1 or the
> Fink compile-of-my-own with X-11, R version 1.8.1.
> This is on OS X 10.3 (aka "Panther"), G4 800Mhz with 512M physical RAM.
> I have not altered the Startup options of R.
> Data set is read in from a text file with "read.table", and has 46 variables
> and 1,855 cases. Trying the following:
> The DV is categorical, 0 or 1. Most of the IV's are either continuous, or
> correctly read in as factors. The largest factor has 30 levels.... Only the
This means: there are 2^(30-1) = 536.870.912 possible splits to be
evaluated everytime this variable is picked up (minus something due to
empty levels). At least the last time I looked at the code, randomForest
used an exhaustive search over all possible splits. Try reducing the
number of levels to something reasonable (or for a first shot: remove this
variable from the learning sample).
> DV seems to need identifying as a factor to force class trees over
> , importance=FALSE)
> 5 hours later, R.bin was still taking up 75% of my processor. When I've
> tried this with larger data, I get errors referring to the buffer (sorry,
> not in front of me right now).
> Any ideas on this? The data don't seem horrifically large. Seems like there
> are a few options for setting memory size, but I'm not sure which of them
> to try tweaking, or if that's even the issue.
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help