[BioC] venn diagram

Arne.Muller at aventis.com Arne.Muller at aventis.com
Wed Apr 28 11:16:31 CEST 2004


The problem with Venn diagrams is that for > 3 sets the visualization it gets messy. Maybe you can just go for a tabular representation instead of graphics.

With a dendogram you could viz. your genelist similarity using vector similarity. Take the union of all m sets. This is a super set with n elements. Create a m*n matrix (m columns) where the row names represent gene names. The value for each cell is either 0 or 1 depending whether the gene x is present in set y.

You can then create a distance matrix from this by calculating all pariwise combinations of the length normalized cosine between the vectors:

> a <- c(1,1,0,0,1,0)
> b <- c(0,1,1,0,1,1)
> x <- a%*%b / (length(a) * length(b))
> x
[1,] 0.05555556

x is a measure for the similarity between vectors a and b. This is used is a standard procedure in text/document comparison. Since one want s to create a distance matrix one still needs to somehow "invert" this matrix so that high similqrity gets small values!

Once you've your matrix M of cosines (this is a symmetric matrix m). You convert this via as.dist(M), and pass it to the hclust routine.

I'd be interested in the outcome (does it make sense?) - if you're interested. You should only try it if you've got *many* sets to test, so that a real Venn approach gets too complex.

	good luck and let me know how it goes,


Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Anthony
> Bosco
> Sent: 28 April 2004 10:41
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] venn diagram
> Hi.
> I actually want to compare lists of gene names (not expression data) 
> using venn diagram tools.
> For example if I have a cell line and stimulate with several 
> different treatments I want to know which genes are differentially 
> expressed in all treatments or only some of the treatments.
> I would also like to look at this graphically to get an overview of 
> which treatments are more similar.
> I realise that heatmap functions etc would show 
> similarities/differences b/w treatments but in this particular case I 
> want to use venn diagrams.
> Regards
> Anthony
> -- 
> ______________________________________________
> Anthony Bosco - PhD Student
> Institute for Child Health Research
> (Company Limited by Guarantee ACN 009 278 755)
> Subiaco, Western Australia, 6008
> Ph 61 8 9489  , Fax 61 8 9489 7700
> email anthonyb at ichr.uwa.edu.au
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

More information about the Bioconductor mailing list