[R] Can't seem to finish a randomForest.... Just goes and goes!

Bill.Venables@csiro.au Bill.Venables at csiro.au
Mon Apr 5 08:40:35 CEST 2004


Alternatively, if you can arrive at a sensible ordering of the levels
you can declare them ordered factors and make the computation feasible
once again.

Bill Venables.

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Torsten Hothorn
Sent: Monday, 5 April 2004 4:27 PM
To: David L. Van Brunt, Ph.D.
Cc: R-Help
Subject: Re: [R] Can't seem to finish a randomForest.... Just goes and
goes!


On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote:

> Playing with randomForest, samples run fine. But on real data, no go.
>
> Here's the setup: OS X, same behavior whether I'm using R-Aqua 1.8.1 
> or the Fink compile-of-my-own with X-11, R version 1.8.1.
>
> This is on OS X 10.3 (aka "Panther"), G4 800Mhz with 512M physical 
> RAM.
>
> I have not altered the Startup options of R.
>
> Data set is read in from a text file with "read.table", and has 46 
> variables and 1,855 cases. Trying the following:
>
> The DV is categorical, 0 or 1. Most of the IV's are either continuous,

> or correctly read in as factors. The largest factor has 30 levels.... 
> Only the
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This means: there are 2^(30-1) = 536.870.912 possible splits to be
evaluated everytime this variable is picked up (minus something due to
empty levels). At least the last time I looked at the code, randomForest
used an exhaustive search over all possible splits. Try reducing the
number of levels to something reasonable (or for a first shot: remove
this variable from the learning sample).

Best,

Torsten


> DV seems to need identifying as a factor to force class trees over
> regresssion:
>
> >Mydata$V46<-as.factor(Mydata$V46)
> >Myforest.rf<-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7,proximi
> >ties=FALSE
> , importance=FALSE)
>
> 5 hours later, R.bin was still taking up 75% of my processor.  When 
> I've tried this with larger data, I get errors referring to the buffer

> (sorry, not in front of me right now).
>
> Any ideas on this? The data don't seem horrifically large. Seems like 
> there are a few options for setting memory size, but I'm  not sure 
> which of them to try tweaking, or if that's even the issue.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>
>

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list