[BioC] RE: venn diagram

Patrick Cahan pcahan1 at gwu.edu
Wed Apr 28 15:01:52 CEST 2004

Any ideas on how to calculate the significance or rather the probability of getting a given similarity score by chance?

/pc

Patrick Cahan
202.994.8922
pcahan1 at gwu.edu

> You can then create a distance matrix from this by calculating all
> pariwise combinations of the length normalized cosine between the
> vectors:
> > a <- c(1,1,0,0,1,0)
> > b <- c(0,1,1,0,1,1)
> > x <- a%*%b / (length(a) * length(b))
> > x
>           [,1]
> [1,] 0.05555556
>
> x is a measure for the similarity between vectors a and b. This is
> used is a standard procedure in text/document comparison. Since
> one want s to create a distance matrix one still needs to somehow
> "invert" this matrix so that high similqrity gets small values!
>
> Once you've your matrix M of cosines (this is a symmetric matrix
> m). You convert this via as.dist(M), and pass it to the hclust
> routine.
> I'd be interested in the outcome (does it make sense?) - if you're
> interested. You should only try it if you've got *many* sets to
> test, so that a real Venn approach gets too complex.
>
> 	good luck and let me know how it goes,
> 	+regards,
>
> 	Arne
>
> --
> Arne Muller, Ph.D.
> Toxicogenomics, Aventis Pharma
> arne dot muller domain=aventis com