helprhelp at gmail.com
Mon Jul 25 23:45:12 CEST 2005
Here I have a question on clustering methods available in R. I am
trying to down-sampling the majority class in a classification problem
on an imbalanced dataset. Since I don't want to lose information in
the original dataset, I don't want to use naive down-sampling: I think
using clustering on the majority class' side to select
"representative" samples might help. So, my question is, which
clustering method should be tested to get the best result. I think the
key thing might be the selection of "distance" considering the next
step in which I would like to use decision trees.
Please share your experience in using clustering (Any available
implementation outside R is also welcome)
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I believed..."
More information about the R-help