[R] Problems with hclust and/or cutree.

Rolf Turner r.turner at auckland.ac.nz
Fri May 30 02:33:13 CEST 2008


I have been attempting to do some work using hclust, and have run
into a (possibly subtle) problem.

  The background is that I constructed a dissimilarity matrix ``d1''
(it involved something called the ``Jaccard similarity coefficient'';  
I won't go
into the details unless requested).  I then did

	d2 <- as.dist(d1)
	try <- hclust(d2,method=ward)
	plot(try,labels=FALSE)

After looking at the plot, I tried

	mmm <- cutree(try,h=7)

and got the error message

Error in cutree(try, h = 7) :
   the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first

I was much puzzled by this initially, since try is already an  
``hclust'' object
(I checked class(try)) but after a substantial amount of hair-tearing  
I discovered
that the entries of the height component of try are constant over  
long stretches.
E.g. the first 54 entries are 0 (to the 7 printed decimal places).   
This doesn't
*seem* to be cause for alarm --- the help says explicitly that height  
is a
*non-decreasing* sequence (but not necessarily a strictly increasing  
one).

I checked

	with(try,all.equal(height,sort(height))

and got

[1] TRUE

but order(try$height) is NOT equal to 1:745 (note that 746 is the  
number of subjects
in the data set).

I have done an RSiteSearch() on "cutree" and turned up nothing that  
seemed relevant.

Finally, I found that if I do

	try$height <- round(try$height,6)
then

	mmm <- cutree(try,h=7)

``works'' (without error).

Are there traps for young players in employing such a strategy?  What  
should I
really worry about?

If anyone wants to try it for themselves with the real distance  
matrix, I can bundle
it up and email it to them privately.

Thanks for any insights.

	cheers,

		Rolf Turner


######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}



More information about the R-help mailing list