[BioC] venn diagram
Arne.Muller at aventis.com
Arne.Muller at aventis.com
Wed Apr 28 11:16:31 CEST 2004
Hi,
The problem with Venn diagrams is that for > 3 sets the visualization it gets messy. Maybe you can just go for a tabular representation instead of graphics.
With a dendogram you could viz. your genelist similarity using vector similarity. Take the union of all m sets. This is a super set with n elements. Create a m*n matrix (m columns) where the row names represent gene names. The value for each cell is either 0 or 1 depending whether the gene x is present in set y.
You can then create a distance matrix from this by calculating all pariwise combinations of the length normalized cosine between the vectors:
> a <- c(1,1,0,0,1,0)
> b <- c(0,1,1,0,1,1)
> x <- a%*%b / (length(a) * length(b))
> x
[,1]
[1,] 0.05555556
x is a measure for the similarity between vectors a and b. This is used is a standard procedure in text/document comparison. Since one want s to create a distance matrix one still needs to somehow "invert" this matrix so that high similqrity gets small values!
Once you've your matrix M of cosines (this is a symmetric matrix m). You convert this via as.dist(M), and pass it to the hclust routine.
I'd be interested in the outcome (does it make sense?) - if you're interested. You should only try it if you've got *many* sets to test, so that a real Venn approach gets too complex.
good luck and let me know how it goes,
+regards,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Anthony
> Bosco
> Sent: 28 April 2004 10:41
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] venn diagram
>
>
> Hi.
>
> I actually want to compare lists of gene names (not expression data)
> using venn diagram tools.
>
> For example if I have a cell line and stimulate with several
> different treatments I want to know which genes are differentially
> expressed in all treatments or only some of the treatments.
>
> I would also like to look at this graphically to get an overview of
> which treatments are more similar.
>
> I realise that heatmap functions etc would show
> similarities/differences b/w treatments but in this particular case I
> want to use venn diagrams.
>
>
> Regards
>
>
> Anthony
> --
> ______________________________________________
>
> Anthony Bosco - PhD Student
>
> Institute for Child Health Research
> (Company Limited by Guarantee ACN 009 278 755)
> Subiaco, Western Australia, 6008
>
> Ph 61 8 9489 , Fax 61 8 9489 7700
> email anthonyb at ichr.uwa.edu.au
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
More information about the Bioconductor
mailing list