[R] Clustering methods for data that has bimodal distribution

Adrian Johnson oriolebaltimore at gmail.com
Mon Dec 5 04:52:33 CET 2016


Dear group,
pardon me for a naive question. I have data matrix (11K rows , 4K columns).
The data range is between -1 to 1. Not strictly integers, but real
numbers with at least place values in millionths.

The data distribution is peculiar (if I do plot(density(myMatrix)),  I
get nice bimodal curve (nice standard distribution between -1 and 0
and another curve between 0 and 1) .

I am interested in clustering the data (using conesnsus clustering
(that uses K-means)).

My question are:

1. If my data is range is between -1  and 1. Is K-means appropriate
method. considering if the data might have ties.

2. Although K-means is non-parametric, would a bimodal distributed
data be okay as input to K-means.

I appreciate any suggestion.
Thanks
Adrian.



More information about the R-help mailing list