[R] hclust too slow?

akonla anna.konstorum at gmail.com
Tue Nov 17 20:15:02 CET 2009


Hi,

I am new to clustering in R and I have a dataset with approximately 17,000
rows and 8 columns with each data point a numerical character with three
decimal places.  I would like to cluster the 8 columns so that I get a
dendrogram as an output.  So, I am simply creating a distance matrix of my
data, using the 'hclust' function, and then plotting the results (see below,
my data is contained in the text file).

x<-read.table('SEP_IR_1113_3.txt', header=TRUE,sep="\t')
x.dist=dist(x)
hc=hclust(x.dist,method="average")
plot(hc, hang=-1)

Unfortunately, the hclust function, although it produces no error terms,
takes a very long time to run  (>4 hours) and my computer kills the program
before it finishes.  I don't think this data set is so large to cause such a
long computing time, and I have plenty of memory since I am running this
analysis on our university computing cluster.

Has anyone run into this problem before and does anyone have any tips on how
I can speed up processing?  I can provide extra information if necessary
regarding my problem.

Thank you!
-- 
View this message in context: http://old.nabble.com/hclust-too-slow--tp26395774p26395774.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list