[BioC] How to do clustering

Thomas Girke thomas.girke at ucr.edu
Wed Jun 13 17:39:16 CEST 2007

Dear Alex,
In addition, to Sean's advice, I would like to point out that the 
sample you are giving below indicates that you are trying to pass on
to the heatmap function a column dendrogram plus a row dendrogram. With your 
matrix of 238,000 rows by 49 columns you should have only a column 
dendrogram, because the row dendrogram would take more than 200 GB of memory to
calculate. You can still use the heatmap or heatmap.2 functions by turning off the row
sorting by setting the Rowv argument to NA. In addition to this, I would
consider to filter your rows in a meaningful manner to a much smaller
number, perhaps by using R's IQR function to remove all rows with very
low variability. I am suggesting this because, you won't see any
patterns in the heatmap when you have so many rows. If the row filtering 
works then you could generate a dendrogram for the row dimension as well.
Remember: hclust will require ~4 GB of memory to cluster ~30,000 items
and < 1 GB for 10,000 items, and pvclust that uses hclust internally will 
need even much more than this.  

As a more general advice, when working with large data sets in R always subset 
your data to something very small to test out your strategy first, because this
will save you a lot of time.
In your case, this could by done by selecting just the first 100 rows of your
matrix like this: 
		my_matrix <- my_matrix[1:100, ]

Once you have tested things out then just remove in your script/protocol 
the '[1:100,]' part. 



On Wed 06/13/07 06:02, Sean Davis wrote:
> ssls sddd wrote:
> > Dear Dr.Thomas Girke,
> > 
> > I have one more question for you. I tried pvclust in the session of
> > 'Obtain significant clusters by pvclust bootstrap analysis' for my data, x.
> > 
> > But I have a problem with:
> > 
> > heatmap(x, Rowv=dend_colored, Colv=as.dendrogram(hc), col=my.colorFct(),
> > scale="row", RowSideColors=mycolhc)
> > 
> > the error was:
> > 
> > error in heatmap(x, Rowv = dend_colored, Colv = as.dendrogram(hc), col =
> > my.colorFct(),  :
> >         'x' must be a numeric matrix
> > 
> > I ran 'x[1:3,1:3]' and it produced the following:
> > 
> >               AIRNS_A09 AIRNS_A11 AIRNS_A12
> > SNP_A-1780271   1.85642   1.50956   1.73154
> > SNP_A-1780274   1.72140   1.83712   1.85948
> > SNP_A-1780277   2.04241   1.53458   1.65270
> > 
> > I think the x is a numeric matrix. Do you think where I may get wrong?
> Try coercing the x into a matrix directly:
> heatmap(as.matrix(x), Rowv=dend_colored, Colv=as.dendrogram(hc),
> col=my.colorFct(), scale="row", RowSideColors=mycolhc)
> Does this fix the problem?  You can always check the class of an object
> by doing something like:
> class(x)
> which should report:
> [1] "matrix"
> Hope that helps.
> Sean

Dr. Thomas Girke
Assistant Professor of Bioinformatics
Director, IIGB Bioinformatic Facility
Center for Plant Cell Biology (CEPCEB)
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1008 Noel T. Keen Hall
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437

More information about the Bioconductor mailing list