# [R] Help understanding cutree used for Dunn Index

Kerry kbrownk at gmail.com
Fri Dec 9 06:54:54 CET 2011

```Basic question:

Is it correct to assume that when using cutree to set the # clusters
(say k=4), cutree determines the clusters by the largest distances
among all potential clusters?

I've read the R help for cutree and am using it to define the number
of groups to obtain Dunn Index scores (using clValid library) for
cluster analysis (using Euclidean Distance and Ward's method)

I understand that cutree is used to set the number of clusters for
which the Dunn Index will base it's score on. But the r help doesn't
explain how the groups are determined. Prior to measuring the Dunn
Index, the cluster hierarchy formed using Euclidean Distance and
Ward's provides a certain number of connected pairs of samples. For
example:

Say at the 1st iteration (hierarchy level 1), my n=68 samples are
connected into k=32 groups. The next iteration connects these 32 into
k=16 groups (hierarchy level 2). 3rd iteration = 8; 4th iteration = 4,
and 5th iteration = 2. The distances from one hierarchy level to the
next will differ for each group.

Is it correct to assume that I could cut the tree into anywhere from
k=2 to k=32+16+8+4+2=62 groups? That is, cutree(data,k=2) though
cutree(data,k=62) is valid, whereas anything outside those values is
not?

Now say, I use cutree(data,k=3) to define 3 clusters. Will cutree look
back at the cluster tree created by the Ward's method and then take
the 3 largest distance values from among these 62 potential groups so
that when I use Dunn index, those will be the only distances
considered?

I can post code and/or data if helpful.

Thanks,
kbrownk

```