[BioC] Clustering and gene modules

Sun Jan 2 00:16:24 CET 2005

New Year greetings to all. 

I have a problem which I am not sure how best to solve, and hope to
seek advice from the list.

I have 200 oligonucleotide arrays of about 13000 transcripts,
belonging to approximately 6 different cancer subtypes. Essentially, I
am hoping to first identify "gene modules" of gene expression
corresponding to a specific cancer subtype, or groups of subtypes.
(e.g. present only in A and B cancer, but not in C, D, E or F).
Subsequently, I wish to label these modules by gene ontology. (e.g.
"T-cell response" module)

I tried a non-R program (GenXpress) which has been used to publish
work in Nature Genetics, but I ran into quite a few freezes and
glitches with the online cancer data posted alongside the program (not
sure if it's a Windows issue on my side).

I was thinking of first filtering the transcripts by variation and
minimum expression, performing hierarchical clustering for the final
gene set, choosing gene clusters by a minimum cluster size e.g. 20
transcripts, sifting through these clusters to find "modules" by
identifying subclusters differentiating between various permutations
of cancer A, B, C, D, E and F to a minimum significance value, and
then using the package gocluster to identify the relevant annotations
for each of these clusters.

Any advice would be greatly appreciated. Thank you!

Regards,
Min-Han Tan 
Van Andel Institute, MI