[R] Can't seem to finish a randomForest.... Just goes and goes!

David L. Van Brunt, Ph.D. dvanbrunt at well-wired.com
Tue Apr 6 02:16:44 CEST 2004


D'OH!

I clearly just needed to Re-RTFM!!!  I had a column still coded as TEXT
(yup, "Monday", etc), and the randomForest manual by Breiman says they need
to be numerically coded. Easy recode. I'll try running it RIGHT this time,
and let you all know how this goes.  Grumble mumble mumble....

On 4/5/04 1:40, "Bill.Venables at csiro.au" <Bill.Venables at csiro.au> wrote:

> Alternatively, if you can arrive at a sensible ordering of the levels
> you can declare them ordered factors and make the computation feasible
> once again.
> 
> Bill Venables.
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Torsten Hothorn
> Sent: Monday, 5 April 2004 4:27 PM
> To: David L. Van Brunt, Ph.D.
> Cc: R-Help
> Subject: Re: [R] Can't seem to finish a randomForest.... Just goes and
> goes!
> 
> 
> On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote:
> 
>> Playing with randomForest, samples run fine. But on real data, no go.
>> 
>> Here's the setup: OS X, same behavior whether I'm using R-Aqua 1.8.1
>> or the Fink compile-of-my-own with X-11, R version 1.8.1.
>> 
>> This is on OS X 10.3 (aka "Panther"), G4 800Mhz with 512M physical
>> RAM.
>> 
>> I have not altered the Startup options of R.
>> 
>> Data set is read in from a text file with "read.table", and has 46
>> variables and 1,855 cases. Trying the following:
>> 
>> The DV is categorical, 0 or 1. Most of the IV's are either continuous,
> 
>> or correctly read in as factors. The largest factor has 30 levels....
>> Only the
>                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> This means: there are 2^(30-1) = 536.870.912 possible splits to be
> evaluated everytime this variable is picked up (minus something due to
> empty levels). At least the last time I looked at the code, randomForest
> used an exhaustive search over all possible splits. Try reducing the
> number of levels to something reasonable (or for a first shot: remove
> this variable from the learning sample).
> 
> Best,
> 
> Torsten
> 
> 
>> DV seems to need identifying as a factor to force class trees over
>> regresssion:
>> 
>>> Mydata$V46<-as.factor(Mydata$V46)
>>> Myforest.rf<-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7,proximi
>>> ties=FALSE
>> , importance=FALSE)
>> 
>> 5 hours later, R.bin was still taking up 75% of my processor.  When
>> I've tried this with larger data, I get errors referring to the buffer
> 
>> (sorry, not in front of me right now).
>> 
>> Any ideas on this? The data don't seem horrifically large. Seems like
>> there are a few options for setting memory size, but I'm  not sure
>> which of them to try tweaking, or if that's even the issue.
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>> 
>> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

-- 
David L. Van Brunt, Ph.D.
Outlier Consulting & Development
mailto: <ocd at well-wired.com>




More information about the R-help mailing list