[R] Is it possible to obtain an agglomeration schedule with R cluster analyis

Bob Green bgreen at dyson.brisnet.org.au
Sun Feb 24 00:33:27 CET 2013


Willam,

Many thanks. I'll check this against my data tomorrow when I'm back 
at work.  This looks just what I wanted.

Regards

Bob


At 09:27 AM 24/02/2013, William Dunlap wrote:
>You didn't show what the tabular summary should look like.
>However, look at the height and merge components of
>an hclust object:
>
> > hc3 <- hclust(dist(USArrests[1:8, c(1,2,4)]))
> > data.frame(hc3[2:1])
>       height merge.1 merge.2
>1   9.297849      -1      -8
>2  13.609188      -2      -5
>3  23.779193      -4      -6
>4  33.865321      -3       2
>5  48.229659       1       3
>6 104.636227       4       5
>7 185.135221      -7       6
>The two merge.* columns identify what groups merged at
>the corresponding height value.  Negative values, i, refer to the
>-i'th leaf value in the 'labels' component and positive values, i, refer
>to cluster created in the i'th row of the data.frame.  The following
>function transforms those references into name:
>
>f <- function(hc){
>      data.frame(row.names=paste0("Cluster",seq_along(hc$height)),
>                 height=hc$height,
>                 components=ifelse(hc$merge<0, 
> hc$labels[abs(hc$merge)], paste0("Cluster",hc$merge)),
>                 stringsAsFactors=FALSE)
>}
>
>as in
> > f(hc3)
>              height components.1 components.2
>Cluster1   9.297849      Alabama     Delaware
>Cluster2  13.609188       Alaska   California
>Cluster3  23.779193     Arkansas     Colorado
>Cluster4  33.865321      Arizona     Cluster2
>Cluster5  48.229659     Cluster1     Cluster3
>Cluster6 104.636227     Cluster4     Cluster5
>Cluster7 185.135221  Connecticut     Cluster6
>
>Compare that to the output of str(as.dendrogram(hc3)):
>
> > str(as.dendrogram(hc3))
>--[dendrogram w/ 2 branches and 8 members at h = 185]
>   |--leaf "Connecticut"
>   `--[dendrogram w/ 2 branches and 7 members at h = 105]
>      |--[dendrogram w/ 2 branches and 3 members at h = 33.9]
>      |  |--leaf "Arizona"
>      |  `--[dendrogram w/ 2 branches and 2 members at h = 13.6]
>      |     |--leaf "Alaska"
>      |     `--leaf "California"
>      `--[dendrogram w/ 2 branches and 4 members at h = 48.2]
>         |--[dendrogram w/ 2 branches and 2 members at h = 9.3]
>         |  |--leaf "Alabama"
>         |  `--leaf "Delaware"
>         `--[dendrogram w/ 2 branches and 2 members at h = 23.8]
>            |--leaf "Arkansas"
>            `--leaf "Colorado"
>
>Does f() produce the information you need for your display?
>
>Bill Dunlap
>Spotfire, TIBCO Software
>wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf
> > Of Bob Green
> > Sent: Saturday, February 23, 2013 12:49 PM
> > To: Uwe Ligges
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Is it possible to obtain an agglomeration 
> schedule with R cluster analyis
> >
> > Hello Uwes,
> >
> > Thanks. Re-reading the hclust pages I found that using the hclust
> > 'USArrests' data  that the command > plot (hc1)  will generate the
> > order in which cases joined. however, I still can't see how to obtain
> > the respective height at which each case joined each cluster or the
> > height when clusters merge.
> >
> >
> > The dendrogram {stats} page provides the following code which
> > produces the information that I require. However, what I would like
> > to obtain is a table of the height at which cluster formed.
> >
> >  > hc <- hclust(dist(USArrests), "ave")
> >  > (dend1 <- as.dendrogram(hc)) # "print()" method
> >  > str(dend1)          # "str()" method
> >
> > I also found as.hclust which plots what I want, but I still can't
> > find a way to produce the actual height values which are being
> > plotted, for example as a tabular summary.
> >
> >   plot(hc) ;  mtext("hclust", side=1)
> >
> > Any assistance is appreciated,
> >
> > Bob
> >
> >
> >
> > At 04:01 AM 24/02/2013, Uwe Ligges wrote:
> >
> >
> > >On 22.02.2013 11:41, Bob Green wrote:
> > >>Hello,
> > >>
> > >>In SPSS the cluster analysis output includes an agglomerations schedule,
> > >>which details the stages when cases are joined.
> > >>
> > >>Is it possible to obtain such output when performing cluster analysis in
> > >>R?  If so, I'd appreciate advice regarding how to obtain this 
> information.
> > >
> > >
> > >If you are talking about hierarchical clustering via hclust(), see ?hclust
> > >It tells you that the relevant information is available inside the
> > >object and you can even see it via the plot method.
> > >
> > >Uwe Ligges
> > >
> > >
> > >
> > >>
> > >>Any assistance is appreciated,
> > >>
> > >>Regards
> > >>
> > >>Bob
> > >>
> > >>______________________________________________
> > >>R-help at r-project.org mailing list
> > >>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>PLEASE do read the posting guide
> > >>http://www.R-project.org/posting-guide.html
> > >>and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list