# [BioC] venn diagram

Aedin aedin.culhane at ucd.ie
Wed Apr 28 14:52:08 CEST 2004

```Hi
If you are comparing categorical data such as this, maybe try Multiple
Correspondence Analysis (available in the ade4 package).
Aedin

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
Arne.Muller at aventis.com
Sent: 28 April 2004 10:17
To: anthonyb at ichr.uwa.edu.au; bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] venn diagram

Hi,

The problem with Venn diagrams is that for > 3 sets the visualization it
gets messy. Maybe you can just go for a tabular representation instead of
graphics.

With a dendogram you could viz. your genelist similarity using vector
similarity. Take the union of all m sets. This is a super set with n
elements. Create a m*n matrix (m columns) where the row names represent gene
names. The value for each cell is either 0 or 1 depending whether the gene x
is present in set y.

You can then create a distance matrix from this by calculating all pariwise
combinations of the length normalized cosine between the vectors:

> a <- c(1,1,0,0,1,0)
> b <- c(0,1,1,0,1,1)
> x <- a%*%b / (length(a) * length(b))
> x
[,1]
[1,] 0.05555556

x is a measure for the similarity between vectors a and b. This is used is a
standard procedure in text/document comparison. Since one want s to create a
distance matrix one still needs to somehow "invert" this matrix so that high
similqrity gets small values!

Once you've your matrix M of cosines (this is a symmetric matrix m). You
convert this via as.dist(M), and pass it to the hclust routine.

I'd be interested in the outcome (does it make sense?) - if you're
interested. You should only try it if you've got *many* sets to test, so
that a real Venn approach gets too complex.

good luck and let me know how it goes,
+regards,

Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Anthony
> Bosco
> Sent: 28 April 2004 10:41
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] venn diagram
>
>
> Hi.
>
> I actually want to compare lists of gene names (not expression data)
> using venn diagram tools.
>
> For example if I have a cell line and stimulate with several
> different treatments I want to know which genes are differentially
> expressed in all treatments or only some of the treatments.
>
> I would also like to look at this graphically to get an overview of
> which treatments are more similar.
>
> I realise that heatmap functions etc would show
> similarities/differences b/w treatments but in this particular case I
> want to use venn diagrams.
>
>
> Regards
>
>
> Anthony
> --
> ______________________________________________
>
> Anthony Bosco - PhD Student
>
> Institute for Child Health Research
> (Company Limited by Guarantee ACN 009 278 755)
> Subiaco, Western Australia, 6008
>
> Ph 61 8 9489  , Fax 61 8 9489 7700
> email anthonyb at ichr.uwa.edu.au
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

```