[BioC] semi-supervised clustering

friedrich.leisch at stat.uni-muenchen.de friedrich.leisch at stat.uni-muenchen.de
Fri Oct 26 09:33:38 CEST 2007


>>>>> On Thu, 25 Oct 2007 09:12:32 -0700 (PDT),
>>>>> Tim Smith (TS) wrote:

  > Hi,
  > Is there any package that implements semi-supervised clustering
  > through 'must-link' and 'cannot-link' constraints?

Package flexclust on CRAN can do constrained clustering. The feature
is not well documented in the current release version, but

  myfam <- kccaFamily("kmeans", groupFun = "minSumClusters")
  clres <- kcca(x, k, myfam, group=mygroups)

will assign all points which belong to one group to the same
cluster using kmeans (but flexclust can use other distances than
Euclidean, too).

  groupFun = "minSumClusters" will assign to the cluster where the
             center has minimal average distance to all group members.

  groupFun = "majorityClusters" assigns the all group members to the
             cluster the majority belongs to.

  groupFun = "differentClusters" implements a 'cannot-link'
             constraint, obviously the group sizes must be smaller
             than the number of clusters in this case.

Some details on the algorithms used can be found in

	http://www.ci.tuwien.ac.at/papers/Leisch+Gruen-2006.pdf

Hope this helps,
Fritz



More information about the Bioconductor mailing list