[R] function 'silhouette' in package 'cluster'

Martin Maechler maechler at stat.math.ethz.ch
Fri Feb 7 14:19:02 CET 2003


>>>>> "Laurent" == Laurent Gautier <laurent at cbs.dtu.dk>
>>>>>     on Thu, 6 Feb 2003 09:37:27 +0100 writes:

    Laurent> Dear all,
    Laurent> I am trying (without much success) to use the fuction 'silhouette'.
    Laurent> Would anyone encountered that before (or would know where I am wrong ?)

Lauren has sent me his data in the mean time.
He hit a bug: I see that the silhouette.default()
method does not work when you have only two clusters.
{Note that this is quite new function/method anyway, so that bug
 only bites those who do want the new functionality!}

For the interested ones: The reason is a subtle version of the
famous ``forgotten drop=FALSE'' :

Current silhouette.default has a line
   diC <- apply(dmatrix[!iC, iC], 2, function(r) tapply(r, x[!iC], mean))
which usually returns a (k-1)x Nj matrix (k = #{clusters}),
but for the case k=2 returns a vector instead of a 1-row matrix.
The solution is to wrap the RHS into an  rbind(.).

Since, from Laurent's example, I found that I could improve the
function further (by allowing a direct `dmatrix' argument
alternatively to `dist'), I post here the new version of the
silhouette.default method. This will be in the next version of
the cluster package, but that will appear only by the end of
February or so.

Regards,
Martin


silhouette.default <- function(x, dist, dmatrix, ...) {
    cll <- match.call()
    if(!is.null(cl <- x$clustering)) x <- cl
    n <- length(x)
    if(!all(x == round(x))) stop("`x' must only have integer codes")
    k <- length(clid <- sort(unique(x)))
    if(k <= 1 || k >= n)
        return(NA)
    ## check dist/dmatrix
    if(missing(dist)) {
        if(missing(dmatrix))
            stop("Need either a dissimilarity `dist' or diss.matrix `dmatrix'")
        if(is.null(dm <- dim(dmatrix)) || length(dm) != 2 || !all(n == dm))
            stop("`dmatrix' is not a dissimilarity matrix compatible to `x'")
    } else { # `dist'
        dist <- as.dist(dist) # hopefully
        if(n != attr(dist, "Size"))
            stop("clustering `x' and dissimilarity `dist' are incompatible")
        dmatrix <- as.matrix(dist)# so we can apply(.) below
    }
    wds <- matrix(NA, n,3, dimnames =
                  list(names(x), c("cluster","neighbor","sil_width")))
    for(j in 1:k) { # j-th cluster:
        Nj <- sum(iC <- x == clid[j])
        wds[iC, 1] <- j
        a.i <- if(Nj > 1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # length(a.i)= Nj
        ## minimal distances to points in all other clusters:
        diC <- rbind(apply(dmatrix[!iC, iC], 2,
                           function(r) tapply(r, x[!iC], mean)))# (k-1) x Nj
        wds[iC,"neighbor"]  <- clid[-j][minC <- max.col(-t(diC))]
        b.i <- diC[cbind(minC, seq(minC))]
        s.i <- (b.i - a.i) / pmax(b.i, a.i)
        wds[iC,"sil_width"] <- s.i
    }
    attr(wds, "Ordered") <- FALSE
    attr(wds, "call") <- cll
    class(wds) <- "silhouette"
    wds
}


    Laurent> Please find below the R ouput. Thanks in advance,
    Laurent> L.


    >> s <- silhouette(ct, as.dist(metric))
    Laurent> Error in "[<-"(*tmp*, iC, "sil_width", value = s.i) : 
    Laurent> number of items to replace is not a multiple of replacement length
    Laurent> In addition: Warning messages: 
    Laurent> 1: longer object length
    Laurent> is not a multiple of shorter object length in: b.i - a.i 
    Laurent> 2: number of rows of result
    Laurent> is not a multiple of vector length (arg 2) in: cbind(mmm, as.vector(each)) 
    >> 
    >> traceback()
    Laurent> 2: silhouette.default(ct, as.dist(metric))
    Laurent> 1: silhouette(ct, as.dist(metric))

    >> str(ct)
    Laurent> Named int [1:2381] 1 1 1 1 1 2 2 2 1 1 ...
    Laurent> - attr(*, "names")= chr [1:2381] "5153" "22" "5185" "356" ...
    >> str(metric)
    Laurent> num [1:2381, 1:2381] 0.000 1.438 1.172 0.751 0.432 ...
    Laurent> - attr(*, "dimnames")=List of 2
    Laurent> ..$ : chr [1:2381] "5153" "22" "5185" "356" ...
    Laurent> ..$ : chr [1:2381] "5153" "22" "5185" "356" ...




More information about the R-help mailing list