[R] Cluster analysis and bootstraps

Simon Blomberg Simon.Blomberg at anu.edu.au
Wed Apr 30 03:46:52 CEST 2003


> It's hard to imagine bootstrapping a confidence interval around a
> categorical value such as a cluster. Perhaps someone else can explain
> this.

Bootstrap support indices are not confidence intervals. The process is, I think, (in a nutshell):

Assume that there are N objects to be clustered, based on the similarity of C variables measured on each of the N objects.

1. Create a bootstrap dataset by resampling the C variables with replacement on the N objects.

2. Run the clustering algorithm on the bootstrap dataset to cluster the N objects.

3. Repeat steps 1 and 2 a large number of times.

4. Construct a majority-rule  consensus tree from all the bootstrapped cluster analyses.

4. Calculate the bootstrap support index for each cluster in the consensus tree as the percentage of times each cluster was recovered in the set of bootstrapped cluster analyses.

See Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. 

No, I don't know how to do this in R, but I agree that it would be useful!

Simon.

Simon Blomberg
Depression & Anxiety Consumer Research Unit
Centre for Mental Health Research
Australian National University
http://www.anu.edu.au/cmhr/
Simon.Blomberg at anu.edu.au  +61 (2) 6125 3379



More information about the R-help mailing list