[R] RandomForests Limitations? Work Arounds?

Liaw, Andy andy_liaw at merck.com
Tue Sep 7 23:07:18 CEST 2010


You're not giving us much to go on, so the info I can give is
correspondingly vague.

I take it you are using RF in "unsupervised" mode.  What RF does in this
case is simply generate a second part of the data that have the same
marginal distribution as the data you have, but the variables are
independent.  It then runs classification treating your data as one
class and the generated data as the other class.  The output is the
proximity matrix, which you can use as the similarity matrix for
clustering.

Given that, you know that RF has to basically use twice as much memory
to store the data.  That's one place where it can take lots of memory.
The second place is the storage of the proximity matrix itself:  If you
have n rows in your data, the proximity matrix is n by n.  For moderate
n this is going to be the part that takes up lots of memory.

Just in case you haven't seen/heard: avoid the formula interface (i.e.,
randomForest(~., data=mydata, ...) because that can really soak up
memory.

Yes, 64-bit OS and 64-bit R can help, but only if you have the RAM to
take advantage of the platform. 

Andy

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Michael Lindgren
> Sent: Tuesday, September 07, 2010 4:28 PM
> To: r-help at r-project.org
> Subject: [R] RandomForests Limitations? Work Arounds?
> 
> Greetings,
> 
> I want to inquire about the memory limitations of the 
> randomForest package.
>  I am attempting to perform clustering analysis using RF but 
> I keep getting
> the message that RF cannot allocate a vector of a given size.  I am
> currently using the 32-bit version of R to run this analysis, 
>  are there
> fewer memory issues when using the 64-bit version of R?  
> Mainly I want to be
> able to run RF on a very large dataset, but keep having to 
> take very small
> sample sizes to do so.  Any advice is more than appreciated.
> 
> Best,
> 
> Michael
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list