[R] Why daisy() in cluster library failed to exclude NA when computing dissimilarity

Martin Maechler maechler at stat.math.ethz.ch
Mon Dec 9 11:36:04 CET 2013


>>>>> Gundala Viswanath <gundalav at gmail.com>
>>>>>     on Sun, 8 Dec 2013 16:11:12 +0900 writes:

    > Hi, According to daisy function from cluster
    > documentation, it can compute dissimilarity when NA
    > (missing) value(s) is present.

    > http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/daisy.html

    > But why when I tried this code

    > library(cluster)
    > x <- c(1.115,NA,NA,0.971,NA)
    > y <- c(NA,1.006,NA,NA,0.645)
    > df <- as.data.frame(rbind(x,y))
    > daisy(df,metric="gower")

    > It gave this message:

    > Dissimilarities :
    > x
    > y NA

    > Metric :  mixed ;  Types = I, I, I, I, I
    > Number of objects : 2
    > Warning messages:
    > 1: In min(x) : no non-missing arguments to min; returning Inf
    > 2: In max(x) : no non-missing arguments to max; returning -Inf

    > I welcome other alternative than gower.

    > I expect the dissimilarity output gives a non-NA value e.g. 0. What's
    > the right way to do it?

Thank you, Gundala, for using a simple reproducible example.

Reading the documentation about Gower's distance a bit more,
you'd have found that it works by basically giving weight zero
to *pairs* of variable values where one of the two values is
missing.

In situations like yours, *all* pairs have at least one missing,
so there's no way to get a non-NA distance.

*AND* the documentation already contains  this, at the very end
 of the section 'Details' :

  If all weights w_k delta(ij;k) are zero, the dissimilarity is set to ‘NA’.

I.e., we have

> install.packages("fortunes")
> fortune("WTFM")

This is all documented in TFM. Those who WTFM don't want to have to WTFM again
on the mailing list. RTFM.
   -- Barry Rowlingson
      R-help (October 2003)

... which I now did in spite of Barry's excellent point
... let's say it's because of approaching Christmas !

Martin Maechler,
ETH Zurich



More information about the R-help mailing list