# [R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject

Ranjan Maitra maitra.mbox.ignored at inbox.com
Tue Oct 2 19:59:51 CEST 2012

```John,

On Tue, 2 Oct 2012 11:35:12 -0400 John Sorkin
<jsorkin at grecc.umaryland.edu> wrote:

> Window XP
> R 2.15
>
> I am running a cluster analysis in which I ask for three clusters (see code below). The analysis nicely tells me what cluster each of the subjects in my input dataset belongs to. I would like two pieces of information
> (1) for every subject in my input data set, what is the probability of the subject belonging to each of the three cluster

K-means provides hard clustering, whatever cluster has closest mean
gets the assignment.

> (2) given a new subject, someone who was not in my original dataset, how can I determine their cluster assignment?

Look at the distance between the subject the cluster means: the one
that is closest gets assigned the cluster.

If you are looking for probabilistic clustering (under Gaussian
mixture model assumptions), you could use model-based clustering: one R
package is mclust.

Btw, note that kmeans is very sensitive to initialization (as is
mclust): you may want to try several random starts (for kmeans),
at the very least. Use the argument "nstart" with a huge number.

HTH,
Ranjan

> Thanks,
> John
>
> # K-Means Cluster Analysis
> jclusters <- 3
> fit       <- kmeans(datascaled, jclusters) # 3 cluster solution
>
> and fit\$cluster tells me what cluster each observation in my input dataset belongs to (output truncated for brevity):
>
> > fit\$cluster   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 . . . .
>   1   1   1   1   3   1   1   1   1   2   1   2   1   1   1   1   1 . . . . How do I get probability of being in cluster 1, cluster 2, and cluster 3 for a given subject, e.g datascaled[1,]?How do I get the cluster assigment for a new subject?Thanks,John
> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:16}}

```