[R] error using daisy() in library(cluster). Bug?

Martin Maechler maechler at stat.math.ethz.ch
Thu Aug 12 17:59:21 CEST 2004

[Reverted back to R-help, after private exchange]

>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Thu, 12 Aug 2004 17:12:01 +0200 writes:

>>>>> "javier" == javier garcia <- CEBAS <rn001 at cebas.csic.es>>
>>>>>     on Thu, 12 Aug 2004 16:28:27 +0200 writes:

    javier> Martin; Yes I know that there are variables with all
    javier> five values 'NA'. I've left them as they are just
    javier> because of saving a couple of lines in the script,
    javier> and because I like to see that they are there,
    javier> although all values are 'NA'.  I don't expect they
    javier> are used in the analysis, but are they the source of
    javier> the problem?

    MM> yes, but only because of "stand = TRUE".

    MM> Yes, one could imagine that it might be good when
    MM> standardizing these "all NA variables" would work

    MM> I'll think a bit more about it.  Thank you for the
    MM> example.

Ok. I've thought (and looked at the R code) a bit longer.
Also considered the fact (you mentioned) that this worked in R 1.8.0.
Hence, I'm considering the current behavior a bug.

Here is the patch (apply to cluster/R/daisy.q in the *source*
 or at the appriopriate place in <cluster_installed>/R/cluster ) :

--- daisy.q	2004/06/25 16:17:47	1.17
+++ daisy.q	2004/08/12 15:23:26
@@ -78,8 +78,8 @@
     if(all(type2 == "I")) {
 	if(stand) {
             x <- scale(x, center = TRUE, scale = FALSE) #-> 0-means
-            sx <- colMeans(abs(x))
-            if(any(sx == 0)) {
+	    sx <- colMeans(abs(x), na.rm = TRUE)# can still have NA's
+	    if(0 %in% sx) {
                 warning(sQuote("x"), " has constant columns ",
                         pColl(which(sx == 0)), "; these are standardized to 0")
                 sx[sx == 0] <- 1

Thank you for helping to find and fix this bug.
Martin Maechler, ETH Zurich, Switzerland

    javier> El Jue 12 Ago 2004 15:11, MM escribió:

    >>> Javier, I could well read your .RData and try your
    >>> script to produce the same error from daisy().
    >>> Your dataframe is of dimension 5 x 180 and has many
    >>> variables that have all five values 'NA' (see below).
    >>> You can't expect to use these, do you?  Martin

More information about the R-help mailing list