[R] outliers using Random Forest

Edgar Acuna edgar at cs.uprm.edu
Mon Apr 19 10:58:38 CEST 2004

Dear Andy,
Thanks for your quick answer. I increased the number of trees and the
outlyingness measure got more stable. But still I do not know if I am
working with the raw measure or with the normalized measure mentioned
in the Breiman's Wald lecture. The normalized measure nout is

where med is the median of the class containing the case correponding
to nout.

Best regards
Edgar Acuna

On Sun, 18 Apr 2004, Liaw, Andy wrote:

> The thing to do is probably:
> 1. Use fairly large number of trees (e.g., 1000).
> 2. Run a few times and average the results.
> The reason for the instability is sort of two fold:
> 1. The random forest algorithm itself is based on randomization.  That's why
> it's probably a good idea to have 500-1000 trees to get more stable
> proximity measures (of which the outlying measures are based on).
> 2. If you are running randomForest in unsupervised mode (i.e., not giving it
> the class labels), then the program treats the data as "class 1", creates a
> synthetic "class 2", and run the classification algorithm to get the
> proximity measures.  You probably need to run the algorithm a few times so
> that the result will be based on several simulated data, instead of just
> one.
> HTH,
> Andy
> > From: Edgar Acuna
> >
> > Hello,
> > Does anybody know if the outscale option of randomForest yields the
> > standarized version of the outlier measure for each case? or
> > the results
> > are only the raw values. Also I have notice that this measure presents
> > very high variability. I mean if I repeat the experiment I am
> > getting very
> > different values for this measure and it is hard to flag the outliers.
> > This does not happen with two other criteria than I am using: LOF and
> > Bay's Orca. I am getting several cases that can be considered
> > as outliers
> > with both approaches.
> >  I run my experiments  using Bupa and Diabetes available at
> > UCI Machine database repository.
> >
> > Thanks in advance for any response.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan as
> Banyu) that may be confidential, proprietary copyrighted and/or legally
> privileged. It is intended solely for the use of the individual or entity
> named on this message.  If you are not the intended recipient, and have
> received this message in error, please notify us immediately by reply e-mail
> and then delete it from your system.
> ------------------------------------------------------------------------------

More information about the R-help mailing list