[R] CLUSTER Package

Fri Mar 30 15:39:46 CEST 2007

It seems nobody else was willing to help here
(when the original poster did not at all follow the posting
guide).

In the mean time, someone else has asked me about part of this,
so let me answer in public :

>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Mon, 12 Mar 2007 17:23:30 +0100 writes:

    MM> Hi Vallejo, I'm pretty busy currently, and feel your
    MM> question has much more to do with how to use R more
    MM> generally than with using the functions from the cluster
    MM> package.

    MM> So you may get help from other R-help readers, but maybe
    MM> only after you have followed the posting-guide and give
    MM> a reproducible example as you're asked there.

    MM> Regards, Martin Maechler

>>>>> "VallejoR" == Vallejo, Roger <Roger.Vallejo at ARS.USDA.GOV>
>>>>>     on Mon, 12 Mar 2007 10:28:01 -0400 writes:

    VallejoR> Hi Martin, In using the Cluster Package, I have
    VallejoR> results for PAM and DIANA clustering algorithms
    VallejoR> (below "part" and "hier" objects):

    VallejoR> part <- pam(trout, bestk) # PAM results

    VallejoR> hier <- diana(trout) # DIANA results

    VallejoR> GeneNames <- show(RG$genes) # Gene Names are in this object

(RG is what)?

    VallejoR> But I would like also to know what genes (NAMES)
    VallejoR> are included in each cluster. I tried
    VallejoR> unsuccessfully to send these results to output
    VallejoR> files (clusters with gene Names). This must be an
    VallejoR> easy task for a good R programmer. I will
    VallejoR> appreciate very much directions or R code on how
    VallejoR> to send the PAM and DIANA results to output files
    VallejoR> including information on genes (Names) per each
    VallejoR> cluster.

For diana(), a *hierarchical* clustering {as agnes()}, you need
to decide about the number of clusters yourself.
Then, as the example in  help(diana.object) shows,
you can use cutree() to get the grouping vector:

Here's a reproducible example :

library(cluster)
data(votes.repub)
dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
print(dv)
plot(dv)

## Cut into 2 groups:
dv2 <- cutree(as.hclust(dv), k = 2)
table(dv2) # 8 and 42 group members
rownames(votes.repub)[dv2 == 1]

## For two groups, does the metric matter ?
dv0 <- diana(votes.repub, stand = TRUE) # default: Euclidean
dv.2 <- cutree(as.hclust(dv0), k = 2)
table(dv2 == dv.2)## identical group assignments

----------------

For pam(), it's even simpler :

data(ruspini)
pr <- pam(ruspini, 4)
plot(pr)

# ....Hit <Return> to see next plot: 
str(pr)
## or
summary(pr)
## .. shows you that there's a component 'clustering' :

pr$clustering
## a grouping vector with case-labels {your Gene names}; here "1","2",.."150:

## and to get them ``visually'':
split(rownames(ruspini), pr$clustering)
## $`1`
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20"

## $`2`
##  [1] "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35"
## [16] "36" "37" "38" "39" "40" "41" "42" "43"

## $`3`
##  [1] "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58"
## [16] "59" "60"

## $`4`
##  [1] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72" "73" "74" "75"