[R] agglomerative coefficient in agnes (cluster)

Liaw, Andy andy_liaw at merck.com
Thu Jan 27 04:43:52 CET 2005



> -----Original Message-----
> From: Weiguang Shi
> 
> Thanks again Andy.
> 
> The definition of AC is understood, yet I have trouble
> picturing the amount of "clear clustering structure"
> it measures. To put things into perspective, for two
> series 
>    1,2,1000,1001
> and 
>    1,2,3,1000
> agnes(x, method="single") generates ac values of 
> 0.998998 and 0.0.7492477 respectively, yet it seems to
> me that both have fairly clear clustering structures.

It has to do with sample sizes.  Consider the following:

testAC <- function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) {
    stopifnot(require(cluster))
    n <- length(x)
    n1 <- ceiling(n * prop1)
    n2 <- n - n1
    agnes(x + rep(center, c(n1, n2)), ...)$ac
}

Now some tests:

> sapply(c(.25, .5), testAC, x=x[1:4], method="single")
[1] 0.7427591 0.9862944
> sapply(1:5 / 10, testAC, x=x[1:10], method="single")
[1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366
> sapply(1:5 / 10, testAC, x=x, method="single")
[1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111

So it seems like AC does not consider isolated singletons as cluster
structures.  This is only discernable in small sample size, though.

Andy


 
>  --- "Liaw, Andy" <andy_liaw at merck.com> wrote: 
> > BTW, I checked the book.  You're not going find much
> > more than that.
> > 
> Thanks for checking.
> 
> Weiguang
> 
> ______________________________________________________________
> ________ 
> Post your free ad now! http://personals.yahoo.ca
> 
>




More information about the R-help mailing list