[BioC] cross validation / bootstrap after classification

Fri Nov 4 13:51:22 CET 2005

On 11/4/05 7:40 AM, "Heike Pospisil" <pospisil at zbh.uni-hamburg.de> wrote:

> Hello Bioconducters,
> 
> I used t-test and/or SAM to find significant genes describing the differences
> in 
> hgu133plus2-chips of two different phenotypical classes. The resulting
> heatmaps 
> show a promising clustering.
> 
> Now, I would like to confirm these clusters and to estimate the robustness of
> this clustering by cross-validation and/or bootstrapping(*). For that, I have
> two questions:
> 
> 1) Does there exists an appropriate package and/or source to perfom
> cross-validation and/or bootstrapping?
> 
> 2) Which is the right measure to rate the goodness of such a clustering? By
> now, 
> I looked over the cluster plots(**) and decided if it was good or a bad
> clustering.

Heike,

If I understand what you did, there is a major problem with your logic, I
think.  You are using the genes from a SUPERVISED analysis to do your
clustering.  There SHOULD be clustering and the strength of the clustering
is already measured by the number of significant genes from your SAM
analysis.  In other words, you told SAM to define genes that divide your two
groups and then ask for hierarchical clustering to give you its best guess
as to the clustering given those genes--of course you will get back a
clustering very close to the clusters that you gave SAM (if, indeed, there
is any difference between the two groups).  So, there is no point in
determining the significance of the heatmap clustering--it doesn't represent
an unsupervised analysis anymore.

Hope that helps a bit.

Sean