[BioC] Heatmap with 7120x500 array
mtmorgan at fhcrc.org
Sat Aug 28 22:42:52 CEST 2010
On 08/28/2010 10:52 AM, Gerhard Thallinger wrote:
> Hi Gaston,
>> I'm trying to produce a heat map that clusters 7120 genes
>> into 6 groups based on 500 conditions. I'm using kmeans and
>> then image, but I've two problems. The first one is that
>> kmeans sometimes doesn't converge even with 10 restarts, and
>> the second one is that the image produced is basically all
>> read (I'm using the standard color scheme), not to mention
>> it's size is massive and very hard to deal with. Does anyone
>> have any suggestions on how I could accomplish this task
>> efficiently, or is this data just too big to cluster?
> Genesis should be able to handle datasets that large
> Adapting the color scale is very easy.
> I can't comment on the convergence of k-means, this could depend
> on the data.
I'd guess the 'all read' (? red) is due to a few extreme values driving
the color palette -- perhaps you intend to log-transform or otherwise
pre-process the data before clustering / display, which might also help
convergence? Likewise applying a filter like varFilter in the genefilter
package to reduce the number of genes being clustered -- most will not
be contributing anything meaningful to the clustering algorithm.
I think what you want to do is to separate the steps of clustering,
reordering rows / columns, and displaying the image. See ?dendrogram,
?reorder, ?heatmap. Heatmpap should be doing little more than plotting
an image (no sense in printing the dendrograms, as they'll be too dense
to make sense of).
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor