[BioC] clustering question

Mon Feb 20 14:03:59 CET 2006

On 2/19/06 23:23, "Kimpel, Mark William" <mkimpel at iupui.edu> wrote:

> I have a general question about clustering of genomic data. The heatmaps
> that are generated are usually scaled row-wise so that variations are
> apparent within rows but not between rows. In looking at the
> documentation of heatmap and hclust, however, is appears that this
> scaling is done after the actual clustering is performed. If heatmap is
> performed on the hclust object with scale="none", it is apparent that
> most of the row clustering is based on overall gene expression levels,
> not on similar column-wise behavior between rows.
> 
> Wouldn't it make sense to scale row-wise before clustering so that the
> row clusters are based more on the correlation of the behavior of rows
> between columns, i.e. two genes would be near each other if the genes
> behaved similarly across samples? I realize that some of this effect may
> be achieved with unscaled data, but it seems to me that the large
> overall expression differences may minimize that.

Mark,

If I understand you correctly, you might want to look at the "distfun"
argument to heatmap. The distfun argument allows you to use any
dissimilarity function that you like, including 1-correlation if you like.

Sean