[R] some thoughts on outlier detection, need help!

Spencer Graves spencer.graves at pdf.com
Sun Aug 7 04:10:31 CEST 2005


	  I'm not certain what you are asking.  PLEASE do read the posting 
guide! "http://www.R-project.org/posting-guide.html".  If you formulate 
your question in terms of a simple example, showing where you got stuck 
as suggested in the posting guide, it might help others understand your 
question and inspire suggestions.

	  TINSTAFL = There is no such thing as a free lunch (Heinlein, The Moon 
is a Harsh Mistress)

	  spencer graves

Weiwei Shi wrote:

> Dear listers:
> I have an idea to do the outlier detection and I need to use R to
> implement it first. Here I hope I can get some input from all the
> guru's here.
> 
> I select distance-based approach---
> step 1:
> calculate the distance of any two rows for a dataframe. considering
> the scaling among different variables, I choose mahalanobis, using
> variance as scaler.
> 
> step 2:
> Let k be the number of points in one "cluster". K is decided by
> answering the following question: how many neighbors a point needs for
> not being an outlier.
> 
> for each point, get the smallest (k-1) distances from step1.  Among
> the (k-1) distances of each point, get the max for the point.
> 
> step 3:
> get the distribution of those max for all the points. Thus, the
> multivariate problem becomes a univariate one. Then the outlier in
> those max's will define the outlier of the point.
> 
> My question is:
> 1. I don't know if using mahalanobis is proper or not since most
> clustering algorithms implemented in R (like pam or clara) use
> euclidean or mahattan.
> 2. Is there a way to get the mahalanobis distance matrix for any two
> rows of a dataframe or matrix?
> 3. My approach does allow a point belonging to more than one
> k-cluster. Is there similar algorithm in R or published?
> 
> Thanks for any suggestions,
> 
> weiwei

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915




More information about the R-help mailing list