[R] Why is it not possible to cut a tree returned by Agnes or Diana by height?

Leszek Nowina |eeko|n@n @end|ng |rom gm@||@com
Sat Apr 13 17:36:32 CEST 2019


    > asdf = data.frame(x=c(1,2,3), y=c(4,5,6), z=c(7,8,9))
    > cutree(agnes(asdf), h=100)
    Error in cutree(agnes(asdf), h = 100) :
      the 'height' component of 'tree' is not sorted (increasingly)
    > cutree(diana(asdf), h=100)
    Error in cutree(diana(asdf), h = 100) :
      the 'height' component of 'tree' is not sorted (increasingly)

I'm not sure if I understand why this is the case.

This is what I want: Cluster stuff by the //distances//, **not** by
how many clusters I want to have.

If two things are further from each other than X, they should go to
different clusters. Otherwise, the same cluster.

Is it unreasonable what I'm asking for? I image if I was to manually
implement Agnes or Diana this would go like that: stop joining
clusters if the smallest distance between any pair of clusters is
larger than X (Agnes) or stop dividing clusters if the largest cluster
has a diameter of X (Diana); but since both methods always join/divide
to the very end I thought using cutree with a height parameter would
give me what I need. It won't.

Am I missing something?



More information about the R-help mailing list