[BioC] RE: venn diagram

Wed Apr 28 15:01:52 CEST 2004

Any ideas on how to calculate the significance or rather the probability of getting a given similarity score by chance?  

/pc

Patrick Cahan
202.994.8922
pcahan1 at gwu.edu

> You can then create a distance matrix from this by calculating all 
> pariwise combinations of the length normalized cosine between the 
> vectors:
> > a <- c(1,1,0,0,1,0)
> > b <- c(0,1,1,0,1,1)
> > x <- a%*%b / (length(a) * length(b))
> > x
>           [,1]
> [1,] 0.05555556
> 
> x is a measure for the similarity between vectors a and b. This is 
> used is a standard procedure in text/document comparison. Since 
> one want s to create a distance matrix one still needs to somehow 
> "invert" this matrix so that high similqrity gets small values!
> 
> Once you've your matrix M of cosines (this is a symmetric matrix 
> m). You convert this via as.dist(M), and pass it to the hclust 
> routine.
> I'd be interested in the outcome (does it make sense?) - if you're 
> interested. You should only try it if you've got *many* sets to 
> test, so that a real Venn approach gets too complex.
> 
> 	good luck and let me know how it goes,
> 	+regards,
> 
> 	Arne
> 
> --
> Arne Muller, Ph.D.
> Toxicogenomics, Aventis Pharma
> arne dot muller domain=aventis com