[R] How to Ignore NaN values in Rows when using hclust function in making Heatmap??

Wed Sep 22 22:41:52 CEST 2010

I am making heatmaps for a dataset (~ 300*600 matrix) with the following R
script (I am not familiar with R and this is the first time I am using it).

library("gplots")
library("Cairo")
mydata <- read.csv(file="data.csv", header=TRUE, sep=",")
rownames(mydata)=mydata$Name
mydata <- mydata[,2:297]
mydatamatrix <- data.matrix(mydata)
mydatascale <- t(scale(t(mydatamatrix)))
hr <- hclust(as.dist(1-cor(t(mydatascale), method="pearson")),
method="complete")
hc <- hclust(as.dist(1-cor(mydatascale, method="spearman")),
method="complete")
myclhr <- cutree(hr, h=max(hr$height)/2); mycolhr <- sample(rainbow(256));
myclhc <- cutree(hc, h=max(hc$height)/2); mycolhc <- sample(rainbow(256));
mycolhr <- mycolhr[as.vector(myclhr)];
mycolhc <- mycolhc[as.vector(myclhc)];
jpeg("scaleRow.jpg", height=6+2/3, width=6+2/3, units="in", res=1200)
heatmap.2(mydatamatrix, Rowv=as.dendrogram(hr), Colv=as.dendrogram(hc),
dendrogram="both", scale="row", col=rev(heat.colors(10)), cexRow=0.08,
cexCol=0.08, trace="none", density.info="none", symkey= FALSE, key=TRUE,
keysize=1.5, margin=c(5,8),RowSideColor=mycolhr, ColSideColor=mycolhc)
dev.off()

My question is, in the dataset I have good number of rows (~ 17) that has
zero value for all the columns defined.  So when I run these two command
lines, 

hr <- hclust(as.dist(1-cor(t(mydatascale), method="pearson")),
method="complete")
hc <- hclust(as.dist(1-cor(mydatascale, method="spearman")),
method="complete")

I get the error msg:  
error in hclust (as.dist(1-cor(t(mydatascale), method="pearson")), :
NA/NaN/Inf in foreign function call (arg 11).  

It seems to be a problem when NaN exist in all columns for a given row. 
Because, when I delete those rows the script runs fine.   I don't know how
to work my way around this error msg.  I have to include these rows and also
cluster them in the heatmap.  Is there a way to do this?  Please help me!   

In addition to above, in my dataset I have duplicate row names.  I want to
keep it that way but every time I run a script I get a warning message for
not having unique row names.  Is there a way I can ignore this message and
still keep my original row names instead of re-naming them?

Thanks!!
-- 
View this message in context: http://r.789695.n4.nabble.com/How-to-Ignore-NaN-values-in-Rows-when-using-hclust-function-in-making-Heatmap-tp2551032p2551032.html
Sent from the R help mailing list archive at Nabble.com.