[BioC] dendrograms on heatmap.2 (gplots)

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat May 28 18:20:07 CEST 2011


Hi Gavin,

On Sat, May 28, 2011 at 11:06 AM, Gavin Koh <gavin.koh at gmail.com> wrote:
> Dear Steve, I have healthy controls and patients, so two groups.
> k-means misclassifies a few study subjects, but by and large,
> redrawing the dendrogram while preserving the ordering is not going to
> serious mess things up.

Sorry if my post came across in the wrong way -- I'm not trying to
imply that you are trying to show something that isn't true, or
something ... I'm actually not sure how you interpreted my email,
because I'm not sure what you're trying to say in your reply, so let
my try another way :-)

I guess my point is that: yes, you have two groups when you condition
group assignment based on a state we call "healthy" and "affected" (or
whatever you call them here).

If you are asking to group your patients in a different way -- this
time using your gene expression profiles -- it's not totally unusual
for things to change a bit.

So, again, I'm not trying to lecture here, but this is the way I
understand it. If I'm wrong, feel free to correct me:

The distances we "walk along" the arms/branches of the dendrogram say
something about the distance between the "things" they are connecting.
If you didn't change any params in your heatmap call, the default
distance measure between your vectors is calculated by its euclidean
distance, and that just is what it is. The dendrogram is then drawn to
respect those distances. If you move things around, then you are
saying something different about those distances, right?

In this context, I'm confused about your point when you say "redrawing
the dendrogram while preserving the ordering is not going to serious
mess things up" -- what ordering do you expect to be preserved ... is
it the columns of the matrix that you passed in? If you don't want to
move those columns around, then  do you want the branches of the tree
to criss-cross or something?

The way I see it, you are kind of stuck if you intend to draw a
dendrogram at all.

So -- how can we move things around in a natural way?

Maybe you can choose a different distance measure?
Maybe you can normalize your data in a different way?
Maybe you can plot a subset of genes -- maybe those with the highest
variance across all your data, which might result in new distances
calculated, and a different drawing of the branches on the tree.

You could always pass in your own dendrogram structure to the heatmap
and "arbitrarily" calculate distances so that the tree  draws as you
want, but I don't think that's something you'd want to do anyway.

Another approach to show "likeness" between expression profiles is to
not focus on the dendrogram lining up "just so", but to rather add a
list of colors to the examples (columns) of your data by using the
"ColSideColors" parameter. Say the first 10 columns of your matrix are
from the 10 controls, and the last 10 are from the affecteds. You can
do:

R> heatmap.2(my.data, ..., ColSideColColors=c(rep('blue', 10), rep('red', 10)))

If, as you say, the expression profiles are *mostly* similar, you'll
see that, by and large, the blue experiments will be "chunked" w/
blue, and the red expts are chunked with the red, which might show the
same point you're trying to make with the dendrogram.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list