[BioC] Problems with using Pearson correlation in hcluster / hclust2treeview

Eric Liaw [guest] guest at bioconductor.org
Mon Feb 17 10:02:51 CET 2014


Hello,

Are there ways in which using method = "pearson" and link = "average" or "complete" in the hcluster function of amap fails? In other words, is there a mathematical reason why the Pearson correlation as a distance metric yields undefined clustering, or did I encounter a bug?

Full story:
I've ran into an error while using the hclust2treeview function in the ctc package stemming from such a call to hcluster.
Code:
data <- read.table('file.dat', header=TRUE, sep='\t')
clusterings <- hclust2treeview(data, file='filename.cdt', method='pearson', keep.hclust=TRUE) #this calls hr <- hcluster(coverage, method = "pearson", link = "average") internally

I traced the problem to the 5 - 9th entries of hr$order, which had the value -5744, thus throwing the following error when the negative number was used as an index:
Error in `[.default`(xj, i) :
  only 0's may be mixed with negative subscripts
Calls: hclust2treeview ... r2cdt -> [ -> [.data.frame -> [ -> [.factor -> NextMethod
Execution halted

I tried using method = "euclidean", and no error appeared, but I would prefer using another distance metric or know why I can't use the Pearson correlation. My data file seemed to be correctly formatted and comprised a header line followed by a matrix of non-negative integers).
I found this related help thread: http://r.789695.n4.nabble.com/hierarchical-clustering-with-pearson-s-coefficient-td4662788.html

Thanks,
Eric Liaw

Stanford University, undergraduate student

 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ctc_1.32.0     amap_0.8-7     Biobase_2.14.0

loaded via a namespace (and not attached):
[1] tools_2.15.2

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list