[R] Cluster package broken in 1.4.0? -- no!

Martin Maechler maechler at stat.math.ethz.ch
Tue Jan 29 09:35:47 CET 2002


>>>>> "Petros" == Petros Tsantoulis <ptsant at otenet.gr> writes:

    Petros> Greetings,

    Petros> I am reasonably experienced with R but I recently
    Petros> tried to do some clustering using the "cluster"
    Petros> package, in order to see if it would help.

    Petros> I only tried this once with the 1.3.1 version and it
    Petros> worked (I don't quite remember which method I used).

not with the example below!

    Petros> Now, I tried with the 1.4.0 version and no
    Petros> clustering function seems to work with matrices that
    Petros> contain NAs, even though the help page says it
    Petros> should. I even tried the same data that worked with
    Petros> 1.3.1.

    Petros> For example :

This defines a vector foo, but you want a data matrix, don't
you? or are we talking about 1-D observations?
if yes, forget about cluster with NAs!  
	{but there's an easy way around; ask in a separate e-mail if
	 that's what you have and want}

(redone by MM, such as easily cut-&-pastable) :

foo <-
  c(68,NA,33,63,53,62,44,NA,20,69,NA,62,59,43,51,19,38,57,30,53,62,67,42,31,38,
    50,NA,69,67,38,NA,26,NA,52,39,45,42,58,79,92,53,NA,22,21,30,38,64,49,43,28,
    33,42,59,32,41,52,44,54,37,43,32,42,59,39,74,38,33,56,NA,52,38,46,42,29,58,
    54,62,32,53,39,28,34,24,44,46,27,38)
str(foo)# length 87
foomat <- matrix(foo, ncol = 3) # now have data MATRIX !

fanny(foo, k=2, diss=FALSE)
## Error in fanny(foo, k = 2, diss = FALSE) :
##         No clustering performed, NA-values in the dissimilarity matrix.

fanny(foomat, k=2, diss=FALSE)# same error!


    Petros> The help page says :

    Petros> 	In case of a matrix or dataframe, each row
    Petros> corresponds to an observation, and each column
    Petros> corresponds to a variable. All variables must be
    Petros> numeric. Missing values (NAs) are allowed.

and the help page should probably add ``but not too many!''  !!

    Petros> This happens with every (?) clustering function that
    Petros> I tried.

    Petros> Am I doing something wrong? 
(yes)

The help page(s) should (and will) be improved; and yes the NA
handling is far from perfect. R is here still just doing the
same thing as {Rouseeuw et al}'s original code. 
As said above, NAs are only allowed if there are not too many,
i.e., every observation still has enough non-NA entries such
that a distance (dissimilarity) to every other observation can
be computed --- either via the daisy() function in R, or the
"dysta()" subroutine used internally.  

As the help pages say, if you have ``diss = TRUE'', 
no NAs are allowed.

I continue your example, assuming your foo constists of 29
3-dimensional observations :
 foodist <- daisy(foomat)
 str(foodist)
 ## Classes 'dissimilarity', 'dist' atomic [1:406] 10.39 37.34 8.66 30.08 6.40...
 ##-   ..- attr(*, "NA.message")= chr "NA-values in the dissimilarity matrix !"
 ##                                    =======================================
 ##-   ..- attr(*, "Size")= int 29
 ##-   ..- attr(*, "Metric")= chr "euclidean"
 which(is.na(as.matrix(foodist)), arr = TRUE)
 ##-    row col
 ##- 11  11   2
 ##- 11  11   4
 ##- 2    2  11
 ##- 4    4  11
 ##- 13  13  11
 ##- 11  11  13

 ##--> Leaving away observation number 11 will save us!

 foo.m11 <- foomat[ -11, ]
 str(foodm11 <- daisy(foo.m11)) # no "NA message"

 f11d <- fanny(foodm11, k=2, diss = TRUE)# now works
 f11x <- fanny(foo.m11, k=2, diss = FALSE)# now works

 ii <- c(1:4,7)
 all.equal(f11x[ii], f11d[ii]) ##-> TRUE

--------

I hope this helps.

Quick Summary: 
 No, nothing about NA handling
 has changed in R or the cluster package recently.

Regards,
Martin  {maintainer of "cluster"},

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list