[R] kmeans cluster analysis. How do I (1) determine probability of cluster membership (2) determine cluster membership for a new subject

John Sorkin jsorkin at grecc.umaryland.edu
Tue Oct 2 20:56:06 CEST 2012

Thank you!
I just wanted to know how one goes from the values returned by kmeans to a distance metric. You have shown me that is simply the squared distance from the centers! Thanks again.

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Ranjan Maitra <maitra.mbox.ignored at inbox.com> 10/2/2012 2:52 PM >>>
On Tue, 2 Oct 2012 14:32:12 -0400 John Sorkin
<jsorkin at grecc.umaryland.edu> wrote:

> Ranjan,
> Thank you for your help. What eludes me is how one computes the distance from each cluster for each subject. For my first subject, datascaled[1,], I have tried to use the following: 
> v1 <- sum(fit$centers[1,]*datascaled[1,])
> v2 <- sum(fit$centers[2,]*datascaled[1,])
> v3 <- sum(fit$centers[2,]*datascaled[1,])
> hoping the max(v1,v2,v3) would reproduce the group assignment, i.e. simply assign the subject to the group that gives the largest value, but it does not. How is the distance to the three clusters computed for each subject?
> Thanks,
> John 

Well, it should be:

v <- vector(length = 3)
for (i in 1:3) 
   v[i] <- sum((fit$centers[i, ] - datascaled[1, ])^2)


should provide the cluster assignment.

Btw, there is a better, more efficient and automated way to do this,
i.e. avoid the loop using matrices and arrays and apply, but I have not
bothered with that here. 


Important Notice: This mailbox is ignored: e-mails are set to be
deleted on receipt. For those needing to send personal or professional
e-mail, please use appropriate addresses.

GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails

Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. 

More information about the R-help mailing list