[R] cluster in R

Thu Oct 19 01:39:22 CEST 2006

Dear Chris:

thanks for the prompt reply!

You are right, dist from pearson has negatives there, which I should
use cor+1 in my case (since negatively correlated genes should be
considered farthest). Thanks.

as to the ?cluster.stats, I double-checked it and I found I need to
restart my JGR, until then the help page function starts to accept
newly loaded package, like fpc for this case.

sorry for the confusion,

weiwei

On 10/18/06, Christian Hennig <chrish at stats.ucl.ac.uk> wrote:
> Dear Weiwei,
>
> > btw, ?cluster.stats does not work on my Mac machine.
> >> version
> >              _
> > platform       i386-apple-darwin8.6.1
> > arch           i386
> > os             darwin8.6.1
> > system         i386, darwin8.6.1
> > status
> > major          2
> > minor          3.1
> > year           2006
> > month          06
> > day            01
> > svn rev        38247
> > language       R
> > version.string Version 2.3.1 (2006-06-01)
>
> Because I don't have access to a Mac, I can't tell you anything about
> this, unfortunately.
> I always thought that my package should work on all platforms if it passes
> all the standard tests for packages?
> (Is there anyone else who could comment on this please?)
>
> > I have a sample like this
> >> dim(dd.df)
> > [1] 142  28
> >
> > and I want to cluster rows;
> > first of all, I followed the examples for cluster.stats() by
> > d.dd <- dist(dd.df) # use Euclidean
> > d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
> > cluster.stats(d.dd, d.4) # gives me some results like this:
> >
> > $cluster.size
> > [1] 133   5   2   2
> >
> > $avg.silwidth
> > [1] 0.9857916
> >
> > but when I tried to use pearson dist here, by visualization, i think 4
> > or 5 clusters are good for pearson dist, but it gave me a very bad
> > avg.siqlwidth
> >
> > d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
> > $cluster.size
> > [1] 86 31  6 19
> >
> > $avg.silwidth
> > [1] -0.09543089
>
> cor can give negative values, which doesn't fit the usual definition
> of a distance. I don't know what as.dist does in this case, but I think
> that, depending on your application, you should rather use the absolute
> value of the correlation, or 1+cor.
>
> > btw, what's $seperation? where can I find the detailed explanation on
> > the output from cluster.stats?
>
> This is documented on the cluster.stats help page:
>
> separation: vector of clusterwise minimum distances of a point in the
>            cluster to a point of another cluster.
>
> Best regards,
> Christian
>
>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III