[R] hierarchical clustering of large dataset

Hans Ekbrand hans at sociologi.cjb.net
Sat Mar 10 09:51:27 CET 2012


On Fri, Mar 09, 2012 at 08:26:01PM -0500, Massimo Di Stefano wrote:
> my target is to have 'groups of species' based on the similarity of theyr environmental parameters, and build a dendrogram like [2] 
> 
> [2] http://massimo-timecapsule.whoi.edu//data/img/manova_clust_matlab.png

> Il giorno Mar 9, 2012, alle ore 7:18 PM, Peter Langfelder ha scritto:
> 
> > Well, you didn't say that column e was a label that you wanted to keep
> > separate. Any other labels in the data? You may not want to use labels
> > in the distance calculation.

If you want to use the results of the cluster-analysis as evidence on
similarities and differences between species, you _must_ not include
numeric variables representing labels in the matrix. Including them
would mean imposing the expected result onto the data.

First do the cluster analysis, then test the distribution of species
in clusters.

-- 
Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net>



More information about the R-help mailing list