[BioC] Display scale on hclust heatmap

Warnes, Gregory R gregory_r_warnes at groton.pfizer.com
Tue Dec 2 17:13:44 MET 2003


Hi Anthony,

I'm attaching a revised heatmap function that will be migrating into the
standard R code.  If you set 'scale="none"' you will automatically get a
color key. [If scaling is on, different rows/columns have different scales
so a color key doesn't make any sense.]  See the text version of the help
page I'm attaching as well for details on how to select better colors and to
control the break points.

-Greg




> -----Original Message-----
> From: Anthony Bosco [mailto:anthonyb at ichr.uwa.edu.au]
> Sent: Tuesday, December 02, 2003 5:34 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Display scale on hclust heatmap
> 
> 
> Hi.
> 
> I have figured out how to hclust and label plot, but I am having 
> trouble displaying the legend for the heat colours on the plot, and 
> altering the colurs.
> 
> 
> Can anyone help?
> 
> 
> Regards
> 
> 
> Anthony
> -- 
> ______________________________________________
> 
> Anthony Bosco - Cell Biology Research Assistant
> 
> Institute for Child Health Research
> (Company Limited by Guarantee ACN 009 278 755)
> Subiaco, Western Australia, 6008
> 
> Ph 61 8 9489  , Fax 61 8 9489 7700
> email anthonyb at ichr.uwa.edu.au
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 



LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: heatmap.2.R
Type: application/octet-stream
Size: 11920 bytes
Desc: not available
Url : https://www.stat.math.ethz.ch/pipermail/bioconductor/attachments/20031202/0f7a1ccc/heatmap.2-0001.obj
-------------- next part --------------
_D_r_a_w _a _H_e_a_t _M_a_p

_D_e_s_c_r_i_p_t_i_o_n:

     A heat map is a false color image (basically 'image(t(x))') with a
     dendrogram added to the left side  and/or to the top.  Typically,
     reordering of the rows and columns according to some set of values
     (row or column means) within the restrictions imposed by the
     dendrogram is carried out.

_U_s_a_g_e:

     heatmap <- function (x,

                          #-- dendrogram control --#
                          Rowv=NULL,
                          Colv=if(symm)"Rowv" else NULL,
                          distfun = dist,
                          hclustfun = hclust,
                          dendogram = c("both","row","column","none"),
                          symm = FALSE,

                          #-- data scaling --#
                          scale = c("none","row", "column"),
                          na.rm=TRUE,

                          #-- image plot --#
                          revC = identical(Colv, "Rowv"),
                          add.expr,
                          breaks,
                          col=heat.colors(length(breaks)-1),

                          #-- block separation --#
                          colsep,
                          rowsep,
                          sepcolor="white",

                          #-- cell labeling --#
                          cellnote,
                          notecex=1.0,
                          notecol="cyan",

                          #-- level trace --#
                          trace=c("column","row","both","none"),
                          tracecol="yellow",
                          hline=median(breaks),
                          vline=median(breaks),
                          linecol=tracecol,

                          #-- Row/Column Labeling --#
                          margins = c(5, 5),
                          ColSideColors,
                          RowSideColors,
                          cexRow = 0.2 + 1/log10(nr),
                          cexCol = 0.2 + 1/log10(nc),
                          labRow = NULL,
                          labCol = NULL,

                          #-- color key + density info --#
                          key = TRUE,
                          density.info=c("histogram","density","none"),
                          denscol="yellow",

                          #-- plot labels --#
                          main = NULL,
                          xlab = NULL,
                          ylab = NULL,

                          #-- extras --#
                          ...
                          )

_A_r_g_u_m_e_n_t_s:

       x: numeric matrix of the values to be plotted. 

    Rowv: determines if and how the _row_ dendrogram should be
          reordered.  Either a 'dendrogram' or a vector of values used
          to reorder the row dendrogram or 'FALSE' to suppress
          reordering or by default, 'NULL', see _Details_ below.

    Colv: determines if and how the _column_ dendrogram should be
          reordered.  Has the options as the 'Rowv' argument above and
          _additionally_ when 'x' is a square matrix, 'Colv = "Rowv"'
          means that columns should be treated identically to the rows.

 distfun: function used to compute the distance (dissimilarity) between
          both rows and columns.  Defaults to 'dist'.

hclustfun: function used to compute the hierarchical clustering when
          'Rowv' or 'Colv' are not dendrograms.  Defaults to 'hclust'.

dendogram: character string indicating whether to draw 'none', 'row',
          'column' or 'both' dendrograms.  Defaults to 'both'.

    symm: logical indicating if 'x' should be treated *symm*etrically;
          can only be true when 'x' is a square matrix.

   scale: character indicating if the values should be centered and
          scaled in either the row direction or the column direction,
          or none.  The default is '"row"' if 'symm' false, and
          '"none"' otherwise.

   na.rm: logical indicating whether 'NA''s should be removed.

    revC: logical indicating if the column order should be 'rev'ersed
          for plotting, such that e.g., for the symmetric case, the
          symmetry axis is as usual.

add.expr: expression that will be evaluated after the call to 'image'. 
          Can be used to add components to the plot.

  breaks: (optional) Either a numeric vector indicating the splitting
          points for binning 'x' into colors, or a integer number of
          break points to be used, in which case the break points will
          be spaced equally between 'min(x)' and 'max(x)'.

     col: colors used for the image. Defaults to heat colors
          ('heat.colors').

colsep,rowsep,sepcolor: (optional) vector of integers indicating which
          columns or rows should be separated from the preceding
          columns or rows by a narrow space of color 'sepcolor'.

cellnote: (optional) matrix of character strings which will be placed
          within each color cell, e.g. p-value symbols.

 notecex: (optional) numeric scaling factor for 'cellnote' items.

 notecol: (optional) character string specifying the color for
          'cellnote' text.  Defaults to "green".

   trace: character string indicating whether a solid "trace" line
          should be drawn across 'row's or down 'column's, 'both' or
          'none'. The distance of the line from the center of each
          color-cell is proportional to the size of the measurement.
          Defaults to 'column'.

tracecol: character string giving the color for "trace" line. Defaults
          to "cyan".

hline,vline,linecol: Vector of values within cells where a horizontal
          or vertical dotted line should be drawn.  The color of the
          line is controlled by 'linecol'.  Horizontal  lines are only
          plotted if 'trace' is 'row' or 'both'.  Vertical lines are
          only drawn if 'trace' 'column' or 'both'.   'hline' and
          'vline' default to the median of the breaks, 'linecol'
          defaults to the value of 'tracecol'.

 margins: numeric vector of length 2 containing the margins (see
          'par(mar= *)') for column and row names, respectively.

ColSideColors: (optional) character vector of length 'ncol(x)'
          containing the color names for a horizontal side bar that may
          be used to annotate the columns of 'x'.

RowSideColors: (optional) character vector of length 'nrow(x)'
          containing the color names for a vertical side bar that may
          be used to annotate the rows of 'x'.

cexRow, cexCol: positive numbers, used as 'cex.axis' in for the row or
          column axis labeling.  The defaults currently only use number
          of rows or columns, respectively.

labRow, labCol: character vectors with row and column labels to use;
          these default to 'rownames(x)' or 'colnames(x)',
          respectively.

     key: logical indicating whether a color-key should be shown.

density.info: character string indicating whether to superimpose a
          'histogram', a 'density' plot, or no plot ('none') on the
          color-key.

 denscol: character string giving the color for the density display
          specified by 'density.info', defaults to the same value as
          'tracecol'.

main, xlab, ylab: main, x- and y-axis titles; defaults to none.

     ...: additional arguments passed on to 'image' 

_D_e_t_a_i_l_s:

     If either 'Rowv' or 'Colv' are dendrograms they are honored (and
     not reordered).  Otherwise, dendrograms are computed as 'dd <-
     as.dendrogram(hclustfun(distfun(X)))' where 'X' is either 'x' or
     't(x)'.

     If either is a vector (of "weights") then the appropriate
     dendrogram is reordered according to the supplied values subject
     to the constraints imposed by the dendrogram, by 'reorder(dd,
     Rowv)', in the row case. If either is missing, as by default, then
     the ordering of the corresponding dendrogram is by the mean value
     of the rows/columns, i.e., in the case of rows, 'Rowv <-
     rowMeans(x, na.rm=na.rm)'. If either is 'NULL', _no reordering_
     will be done for the corresponding side.

     If 'scale = "row"' the rows are scaled to have mean zero and
     standard deviation one.  There is some empirical evidence from
     genomic plotting that this is useful.

     The default colors range from red to white ('heat.colors') and are
     not pretty.  Consider using enhancements such as the
     'RColorBrewer' package, <URL:
     http://cran.r-project.org/src/contrib/PACKAGES.html#RColorBrewer>
     to select better colors.

_V_a_l_u_e:

     Invisibly, a list with components 

  rowInd: *r*ow index permutation vector as returned by
          'order.dendrogram'.

  colInd: *c*olumn index permutation vector.

_N_o_t_e:

     The original rows and columns are reordered _in any case_ to match
     the dendrogram, e.g., the rows by 'order.dendrogram(Rowv)' where
     'Rowv' is the (possibly 'reorder()'ed) row dendrogram.

     'heatmap.2()' uses 'layout' and draws the 'image' in the lower
     right corner of a 2x2 layout. Consequentially, it can *not* be
     used in a multi column/row layout, i.e., when 'par(mfrow= *)' or
     '(mfcol= *)' has been called.

_A_u_t_h_o_r(_s):

     Andy Liaw, original; R. Gentleman, M. Maechler, W. Huber, G.
     Warnes, revisions.

_S_e_e _A_l_s_o:

     'image', 'hclust'

_E_x_a_m_p_l_e_s:

      data(mtcars)
      x  <- as.matrix(mtcars)
      rc <- rainbow(nrow(x), start=0, end=.3)
      cc <- rainbow(ncol(x), start=0, end=.3)
      hv <- heatmap(x, col = cm.colors(256), scale="column",
                    RowSideColors = rc, ColSideColors = cc, margin=c(5,10),
                    xlab = "specification variables", ylab= "Car Models",
                    main = "heatmap(<Mtcars data>, ..., scale = \"column\")",
                    tracecol="green")

      str(hv) # the two re-ordering index vectors

      data(attitude)
      round(Ca <- cor(attitude), 2)
      symnum(Ca) # simple graphic

      # with reorder
      heatmap(Ca,             symm = TRUE, margin=c(6,6), trace="none" )

      # without reorder
      heatmap(Ca, Rowv=FALSE, symm = TRUE, margin=c(6,6), trace="none" )

      ## For variable clustering, rather use distance based on cor():
      data(USJudgeRatings)
      symnum( cU <- cor(USJudgeRatings) )

      hU <- heatmap(cU, Rowv = FALSE, symm = TRUE, col = topo.colors(16),
                   distfun = function(c) as.dist(1 - c), trace="none")

      ## The Correlation matrix with same reordering:
      hM <- format(round(cU[hU[[1]], hU[[2]]],2))
      hM

      # now with the correlation matrix on the plot itself

      heatmap(cU, Rowv = FALSE, symm = TRUE, col = rev(heat.colors(16)),
                  distfun = function(c) as.dist(1 - c), trace="none",
                  cellnote=hM)

      ## genechip data examples
      ## Don't run: 
      library(affy)
      data(SpikeIn)
      pms <- SpikeIn at pm

      # just the data, scaled across rows
      heatmap(pms, col=rev(heat.colors(16)), main="SpikeIn at pm",
                   xlab="Relative Concentration", ylab="Probeset",
                   scale="row")

      # fold change vs "12.50" sample
      data <- pms / pms[,"12.50"]
      data <- ifelse(data>1,data,-1/data)
      heatmap(data, breaks=8, col=redgreen, tracecol="blue",
                    main="SpikeIn at pm Fold Changes\nrelative to 12.50 sample",
                    xlab="Relative Concentration", ylab="Probeset")
      ## End Don't run



More information about the Bioconductor mailing list