[R] cluster analysis

Martin Maechler maechler at stat.math.ethz.ch
Fri Oct 15 12:02:17 CEST 2004


>>>>> "ChrisH" == Christian Hennig <fm3a004 at math.uni-hamburg.de>
>>>>>     on Fri, 15 Oct 2004 11:43:53 +0200 (MEST) writes:

    ChrisH> Dear James,
    ChrisH> sorry, this is not really an answer.

nor this.  I'm answering Christian...

    ChrisH> I use cutree to obtain clusters from an hclust
    ChrisH> object.  I do not get from the identify help page
    ChrisH> that identify should do anything like what you
    ChrisH> expect it to do... I tried it out and to my surprise
well,
the reason is simple:  
There's been a nice  identify.hclust() method for a long  time 
and this is mentioned (including a link to the page) on the 
?hclust page.

    ChrisH> it behaved as you said, i.e., it indeed does
    ChrisH> something at least similar to what you want it to
    ChrisH> do, and that might be useful also for me. However, I
    ChrisH> wonder where you got the information that identify
    ChrisH> could be suitable to obtain the hclust clusters.

(see above) --- 
     you see: It *does* pay to read documentation carefully

    ChrisH> Puzzled,
    ChrisH> Christian

    ChrisH> PS: It seems that each value is typed twice because
    ChrisH> classi is named, and each value is also a name. Try
    ChrisH> as.vector(classi). (Perhaps a little useful help in
    ChrisH> the end?)

or unname(classi) -- which is slightly more expressive in this
case and possibly more desirable in other situations.

Martin Maechler, ETH Zurich


    ChrisH> On Fri, 15 Oct 2004, James Foadi wrote:

    >> Hello. I wonder if anyone can help me with this.
    >> 
    >> I'm performing cluster analysis by using hclust in stats package.
    >> My data are contained in a data frame with 10 columns, named "drops".
    >> 
    >> Firs I create a distance matrix using dist:
    >> 
    >> distanxe <- dist(drops)
    >> 
    >> Then I perform cluster analysis via hclust:
    >> 
    >> clusters <- hclust(distanze)
    >> 
    >> At this point I want to view the tree plot, and use plot:
    >> 
    >> plot(clusters)
    >> 
    >> Then, once decided which clusters to select, I start identify:
    >> 
    >> classi <- identify(clusters)
    >> 
    >> and click on all clusters to be selected; I then finish by right-clicking.
    >> 
    >> My understanding is that "classi" is now a list containing all individual 
    >> data, grouped in clusters. In my case "classi" contained 10 objects,
    >> simply named [1], [2], etc.
    >> 
    >> To obtain all individual data belonging to one object I thought that
    >> would have sufficed to type for instance:
    >> 
    >> classe_01 <- classi[[1]]
    >> 
    >> Unfortunately, rather than obtaining a vector, I obtain a "numeric" where
    >> each value is typed twice.
    >> 
    >> Can anyone explain why, or what I've done wrong?
    >> 
    >> Many thanks,
    >> 
    >> james
    >> -- 
    >> Dr James Foadi
    >> Structural Biology Laboratory
    >> Department of Chemistry
    >> University of York
    >> YORK YO10 5YW
    >> UK




More information about the R-help mailing list