[BioC] probability of a point membership to a certain cluster

Fri Jan 27 16:17:36 CET 2012

Hi,

On Fri, Jan 27, 2012 at 9:55 AM, Barbara Uszczynska
<uszczynska at gmail.com> wrote:
> Hi,
>
> Thanks for reply. The thing is that z matrix (dataset1MC$z) gives only 1.00,
>  if point is classified to particular cluster:
>
>         [,1]         [,2]
> NA12043 1.000000e+00 2.608455e-15
> NA12249 1.000000e+00 7.784309e-15
> NA12264 1.000000e+00 1.664289e-25
> NA12234 3.151495e-19 1.000000e+00
> NA12236 1.000000e+00 4.399892e-21
>
> It means that samples NA12043, NA12249, NA12264, NA12236 are in the same
> group nr 1, and NA12234 is in group nr 2, but there's no information how
> strong they belong to their groups.

But isn't this, perhaps, a function of your data being easy to separate?

For instance, if you make a synthetic (still easy to separate) 2d
dataset like so:

R> set.seed(123)
R> x1 <- rnorm(100, -1, 1)
R> y1 <- rnorm(100, -1, 1)

R> x2 <- rnorm(100, 1, 1)
R> y2 <- rnorm(100, 1, 1)

You can plot it to see the "easy to split" clusters:

R> plot(x1,y1,pch=19,cex=.7,col="blue", ylim=c(-10, 10), xlim=c(-10,10)
R> points(x2,y2,pch=19,cex=.7,col="red")

Let's see what Mclust tells us:

R> m <- rbind(cbind(x1,y1), cbind(x2,y2))
R> M <- Mclust(m, 2)

Although most points have a super high probability of landing in one
cluster, some do not, eg:

R> sum(apply(M$z, 1, function(row) any(row > .8)))
[1] 175

So, 175 out of 200 points have a class probability assigned to them that's > 0.8

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact