[BioC] Memory error in Mac OS X Aqua GUI v1.01 with cluster package functions

Jenny Bryan jenny at stat.ubc.ca
Thu Feb 24 23:35:04 CET 2005


Pre-filtering the expression data (usually for evidence of
differential expression) has pretty dramatic effects on the clustering
structure you will find.  If your gene features were 2-dimensional,
rather than 12, you could imagine scatterplotting the genes in the
plane.  A typical screen for differential expression would empty
certain regions of that scatterplot and leave behind a very different
point pattern.  (Depending on the filter and the type of data you're
simulating, it might empty an area around the origin and/or a corridor
along the x = y line.)  It will mostly be *that* structure that will
(possibly) be recovered by the cluster analysis.  The same thing will
be operating in your 12-dimensional gene feature space, it's just a
lot harder to illustrate.

Another way to work around your RAM constraint and still use the
routines in cluster and still retain all your genes, would be to
subdivide your genes into smaller groups in an explicit, supervised
way and then enact unsupervised clustering on each group.  You could
then 'manually' merge the results into a global gene clustering.

Jenny

James W. MacDonald writes:

 > Making the assumption that you are simulating microarray data, I don't 
 > see the purpose of clustering such a large set of data. The usual 
 > approach is to whittle the data down to those genes thought to be 
 > 'important' based on some metric, and then to cluster this smaller 
 > subset. See the genefilter package for some examples.
 > 
 > Jim



More information about the Bioconductor mailing list