[R] Outlier Detection with k-Means

Boris Steipe boris.steipe at utoronto.ca
Wed May 7 18:01:36 CEST 2014


Oops.
> (ii)   Your distance calculation is not the cartesian distance. That would be:
>       sqrt(rowSums(iris2[1,]^2 - centers[1,]^2)). 
Strike that. Need more coffee
:-O



> On 2014-05-07, at 4:34 AM, marioger wrote:
> 
>> Hi,
>> 
>> i am hoping you can help me with my problem. I am trying to detect outliers
>> with use of the kmeans algorithm. First I perform the algorithm and choose
>> those object as possible outliers which have a big distance to their cluster
>> center. Instead of using the absolute distance I want to use the relative
>> distance, i.e. the ration of absolute distance of the object to the cluster
>> center and the average distance of all objects of the cluster to their
>> cluster center. The code for outlier detection based on absolute distance is
>> the following:
>> 
>>> # remove species from the data to cluster
>>> iris2 <- iris[,1:4]
>>> kmeans.result <- kmeans(iris2, centers=3)
>>> # cluster centers
>>> kmeans.result$centers
>>> # calculate distances between objects and cluster centers
>>> centers <- kmeans.result$centers[kmeans.result$cluster, ]
>>> distances <- sqrt(rowSums((iris2 - centers)^2))
>>> # pick top 5 largest distances
>>> outliers <- order(distances, decreasing=T)[1:5]
>>> # who are outliers
>>> print(outliers)
>> 
>> But how can I use the relative instead of the absolute distance to find
>> outliers?
>> Thanks in advance.
>> 
>> Mario
>> 
>> 
>> 
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Outlier-Detection-with-k-Means-tp4690098.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list