[R] anyone know why package "RandomForest" na.roughfix is so slow??

jim holtman jholtman at gmail.com
Thu Jul 1 02:26:26 CEST 2010


Use "Rprof" to determine where time is being spent.  This might point
out some problems in the code.

On Wed, Jun 30, 2010 at 7:53 PM, Mike Williamson <this.is.mvw at gmail.com> wrote:
> Hi all,
>
>    I am using the package "random forest" for random forest predictions.  I
> like the package.  However, I have fairly large data sets, and it can often
> take *hours* just to go through the "na.roughfix" call, which simply goes
> through and cleans up any NA values to either the median (numerical data) or
> the most frequent occurrence (factors).
>    I am going to start doing some comparisons between na.roughfix() and
> some apply() functions which, it seems, are able to do the same job more
> quickly.  But I hesitate to duplicate a function that is already in the
> package, since I presume the na.roughfix should be as quick as possible and
> it should also be well "tailored" to the requirements of random forest.
>
>    Has anyone else seen that this is really slow?  (I haven't noticed
> rfImpute to be nearly as slow, but I cannot say for sure:  my "predict" data
> sets are MUCH larger than my model data sets, so cleaning the prediction
> data set simply takes much longer.)
>    If so, any ideas how to speed this up?
>
>                              Thanks!
>                                   Mike
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
>  -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list