[BioC] Class discovery options?

Thu Apr 1 10:59:35 CEST 2004

Hi all,

I have (what I think is) a fairly interesting model system that I'm a little 
unsure how to best analyze, and what tools to use.  I'm looking for any 
advice/ideas/suggestions on techniques/tools that might be applicable.

I'll briefly outline the system a tiny bit first:
- give rats drug at time 0
- 50% of rats get sick at time 3 weeks
- key event deciding if rats get sick happens in 24 hours
- no known way of predicting which rat will get sick

So you can see the problem: if you sacrifice the rats and run arrays at 24 
hours, you don't know which rats will get sick and which ones won't.

Our collaborators ran a bunch of affy arrays at rats at an early time-point.  
When I take this data, normalize it (e.g. RMA) and remove highly invariant 
genes (e.g. require CV > 0.5) I can cluster the animals into two nice groups 
that look quite different on a heatmap when using hierarchical clustering.  
Even better, one of those two groups looks much like (and clusters with) 
control animals.

So my question: is there a better way of doing this analysis?  In particular is 
there a way of fitting a model to each gene that will help me distinguish the 
extent to which each gene may be involved in separation, and which genes are 
consistent across all rats or are simply randomly perturbed?

My thought on model-fitting was to randomly assign each rat to one of the two 
classes (resistant or sensitive), then to run the model-fitting.  I would 
repeat this for all (well, at least many) permutations and then use some sort 
of measure of goodness-of-fit for the model (residuals?) to see select the best 
classification.

Does that seem reasonable?  Any other thoughts or ideas are very much 
appreciated.

Paul