[R] hclust, does order of data matter?

Peter Langfelder peter.langfelder at gmail.com
Mon Nov 15 23:37:54 CET 2010


On Mon, Nov 15, 2010 at 2:19 PM, Reshmi Chowdhury
<rchowdhury at alumni.upenn.edu> wrote:
> Here is the code I am using:
>
> m <- read.csv("data_unsorted.csv",header=TRUE)
> m <- na.omit(m)
> cs <- hclust(dist(t(m),method="euclidean"),method="complete")
> ds <- as.dendrogram(cs)

As Christian said, you may want to plot the cs tree (i.e., plot(cs))
in both cases and make sure that the differences do not just stem from
equal distances. Also, check the matrix m to make sure that the first
column in "data_unsorted.csv" is interpreted correctly by the read.csv
function - if your first data column is interpreted as row names, the
dendograms may indeed look different. Other than the ambiguity of
equal distances, the dendrogram produced by hclust should not depend
on the order of the columns in input to dist.

Peter



More information about the R-help mailing list