[BioC] Venn Diagram

Hervé Pagès hpages at fhcrc.org
Thu Jul 2 22:25:27 CEST 2009


Nice page Thomas, really a must see!

Thanks,
H.

Thomas Girke wrote:
> To get an impression how "pretty and confusingly complex" venn diagrams
> with more than 5 sets would look like, one can take a look at this page
> from combinatorics.org:
> http://www.combinatorics.org/Surveys/ds5/VennSymmEJC.html.
> 
> Also, here is a small collection of methods/ideas for analyzing intersect
> relationships among large numbers of sample sets:
> http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_BioCondManual.html#R_graphics_overlapper
> These approaches are much more scalable than venn comparisons, but lack
> their logical 'not in' relations. The function for computing 'All
> Possible Intersects' is utility wise the closest alternative to venn
> diagrams.
> 
> Thomas
> 
> 
> On Wed, Jul 01, 2009 at 09:41:25PM -0700, hpages at fhcrc.org wrote:
>> Oops, this is wrong, sorry! See a modified version of
>> makeVennTable() below that hopefully does the right thing.
>>
>> Quoting Hervé Pagès <hpages at fhcrc.org>:
>>
>>> Hi Simon,
>>>
>>> Simon Noël wrote:
>>>> Hello every one.
>>>>
>>>> I have ten list of between 4 to 3000 genes and I woudlike to put them all
>>>> together in a venn diagram.
>>>>
>>>> I have try to load the library ABarray and to use doVennDiagram but  
>>>> it can only
>>>> une 3 list.
>>>>
>>>> Does any one know a way to put all of my ten list in the same venn 
>>>> diagram?
>>> A venn diagramm is a 2-D drawing of all the possible intersections
>>> between 2 or 3 sets where each set is represented by a simple 2-D
>>> shape (typically a circle). In the case of 3 sets, the resulting
>>> diagram defines a partitioning of the 2-D plane in 8 regions.
>>> Some people have tried (with more or less success) to put 4 sets on
>>> the diagram but then they need to use more complicated shapes and
>>> the resulting diagram is not as easy to read anymore. With 10 sets,
>>> you would end up with 1024 (2^10) regions in your drawing and you
>>> would need to use extremely complicated shapes for each region
>>> making it really hard to read! Maybe in that case it's easier
>>> to generate the table below.
>>>
>>> ## Let's say your genes are in 'set1', 'set2', etc... Put all the
>>> ## sets in a big list:
>>>
>>> mysets <- list(set1, set2, ..., set10)
>>>
>>> makeVennTable <- function(sets)
>>> {
>>>  mkAllLogicalVect <- function(length)
>>>  {
>>>    if (length == 0L)
>>>      return(logical(0))
>>>    ans0 <- mkAllLogicalVect(length - 1L)
>>>    ans1 <- cbind(TRUE, ans0)
>>>    ans2 <- cbind(FALSE, ans0)
>>>    rbind(ans1, ans2)
>>>  }
>>>  lm <- mkAllLogicalVect(length(sets))
>>>  subsets <- apply(lm, MARGIN=1,
>>>               function(ii)
>>>               {
>>>                 s <- sets[ii]
>>>                 if (length(s) == 0)
>>>                   return("")
>>>                 paste(sort(unique(unlist(s))), collapse=",")
>>>               })
>>>  data.frame(lm, subsets)
>>> }
>>>
>>> Then call makeVennTable() on 'mysets'. For example, with 5 small sets:
>>>
>>>  > mysets <- list(c(1,5,12,4,9,29),
>>>                  c(4,11,3,18),
>>>                  c(22,4,12,19,8),
>>>                  c(7,12,4,5,3),
>>>                  c(25,24,4,2))
>>>
>>>  > makeVennTable(mysets)
>>>        X1    X2    X3    X4    X5                                 subsets
>>>  1   TRUE  TRUE  TRUE  TRUE  TRUE 1,2,3,4,5,7,8,9,11,12,18,19,22,24,25,29
>>>  2   TRUE  TRUE  TRUE  TRUE FALSE         1,3,4,5,7,8,9,11,12,18,19,22,29
>>>  3   TRUE  TRUE  TRUE FALSE  TRUE   1,2,3,4,5,8,9,11,12,18,19,22,24,25,29
>>>  4   TRUE  TRUE  TRUE FALSE FALSE           1,3,4,5,8,9,11,12,18,19,22,29
>>>  5   TRUE  TRUE FALSE  TRUE  TRUE         1,2,3,4,5,7,9,11,12,18,24,25,29
>>>  6   TRUE  TRUE FALSE  TRUE FALSE                 1,3,4,5,7,9,11,12,18,29
>>>  7   TRUE  TRUE FALSE FALSE  TRUE           1,2,3,4,5,9,11,12,18,24,25,29
>>>  8   TRUE  TRUE FALSE FALSE FALSE                   1,3,4,5,9,11,12,18,29
>>>  9   TRUE FALSE  TRUE  TRUE  TRUE       1,2,3,4,5,7,8,9,12,19,22,24,25,29
>>>  10  TRUE FALSE  TRUE  TRUE FALSE               1,3,4,5,7,8,9,12,19,22,29
>>>  11  TRUE FALSE  TRUE FALSE  TRUE           1,2,4,5,8,9,12,19,22,24,25,29
>>>  12  TRUE FALSE  TRUE FALSE FALSE                   1,4,5,8,9,12,19,22,29
>>>  13  TRUE FALSE FALSE  TRUE  TRUE               1,2,3,4,5,7,9,12,24,25,29
>>>  14  TRUE FALSE FALSE  TRUE FALSE                       1,3,4,5,7,9,12,29
>>>  15  TRUE FALSE FALSE FALSE  TRUE                   1,2,4,5,9,12,24,25,29
>>>  16  TRUE FALSE FALSE FALSE FALSE                           1,4,5,9,12,29
>>>  17 FALSE  TRUE  TRUE  TRUE  TRUE        2,3,4,5,7,8,11,12,18,19,22,24,25
>>>  18 FALSE  TRUE  TRUE  TRUE FALSE                3,4,5,7,8,11,12,18,19,22
>>>  19 FALSE  TRUE  TRUE FALSE  TRUE            2,3,4,8,11,12,18,19,22,24,25
>>>  20 FALSE  TRUE  TRUE FALSE FALSE                    3,4,8,11,12,18,19,22
>>>  21 FALSE  TRUE FALSE  TRUE  TRUE                2,3,4,5,7,11,12,18,24,25
>>>  22 FALSE  TRUE FALSE  TRUE FALSE                        3,4,5,7,11,12,18
>>>  23 FALSE  TRUE FALSE FALSE  TRUE                       2,3,4,11,18,24,25
>>>  24 FALSE  TRUE FALSE FALSE FALSE                               3,4,11,18
>>>  25 FALSE FALSE  TRUE  TRUE  TRUE              2,3,4,5,7,8,12,19,22,24,25
>>>  26 FALSE FALSE  TRUE  TRUE FALSE                      3,4,5,7,8,12,19,22
>>>  27 FALSE FALSE  TRUE FALSE  TRUE                    2,4,8,12,19,22,24,25
>>>  28 FALSE FALSE  TRUE FALSE FALSE                            4,8,12,19,22
>>>  29 FALSE FALSE FALSE  TRUE  TRUE                      2,3,4,5,7,12,24,25
>>>  30 FALSE FALSE FALSE  TRUE FALSE                              3,4,5,7,12
>>>  31 FALSE FALSE FALSE FALSE  TRUE                               2,4,24,25
>>>  32 FALSE FALSE FALSE FALSE FALSE
>> The above table is clearly not the expected thing because the subsets
>> in the last column are not a partition of the initial set of genes
>> (some ids appear in several rows).
>> Try this instead:
>>
>> makeVennTable <- function(sets)
>> {
>>    mkAllLogicalVect <- function(length)
>>    {
>>      if (length == 0L)
>>        return(logical(0))
>>      ans0 <- mkAllLogicalVect(length - 1L)
>>      ans1 <- cbind(TRUE, ans0)
>>      ans2 <- cbind(FALSE, ans0)
>>      rbind(ans1, ans2)
>>    }
>>    minter.int <- function(...)
>>    {
>>      args <- list(...)
>>      if (length(args) == 0)
>>        return(integer(0))
>>      if (length(args) == 1)
>>        return(args[[1]])
>>      intersect(args[[1]], do.call(minter.int, args[-1]))
>>    }
>>    munion.int <- function(...)
>>    {
>>      unique(unlist(list(...)))
>>    }
>>    lm <- mkAllLogicalVect(length(sets))
>>    parts <- apply(lm, MARGIN=1,
>>                 function(ii)
>>                 {
>>                   s1 <- do.call(minter.int, sets[ii])
>>                   s2 <- do.call(munion.int, sets[!ii])
>>                   part <- setdiff(s1, s2)
>>                   if (length(part) == 0)
>>                     return("")
>>                   paste(sort(part), collapse=",")
>>                 })
>>    data.frame(lm, parts)
>> }
>>
>> Then:
>>
>>> makeVennTable(mysets)
>>       X1    X2    X3    X4    X5   parts
>> 1   TRUE  TRUE  TRUE  TRUE  TRUE       4
>> 2   TRUE  TRUE  TRUE  TRUE FALSE
>> 3   TRUE  TRUE  TRUE FALSE  TRUE
>> 4   TRUE  TRUE  TRUE FALSE FALSE
>> 5   TRUE  TRUE FALSE  TRUE  TRUE
>> 6   TRUE  TRUE FALSE  TRUE FALSE
>> 7   TRUE  TRUE FALSE FALSE  TRUE
>> 8   TRUE  TRUE FALSE FALSE FALSE
>> 9   TRUE FALSE  TRUE  TRUE  TRUE
>> 10  TRUE FALSE  TRUE  TRUE FALSE      12
>> 11  TRUE FALSE  TRUE FALSE  TRUE
>> 12  TRUE FALSE  TRUE FALSE FALSE
>> 13  TRUE FALSE FALSE  TRUE  TRUE
>> 14  TRUE FALSE FALSE  TRUE FALSE       5
>> 15  TRUE FALSE FALSE FALSE  TRUE
>> 16  TRUE FALSE FALSE FALSE FALSE  1,9,29
>> 17 FALSE  TRUE  TRUE  TRUE  TRUE
>> 18 FALSE  TRUE  TRUE  TRUE FALSE
>> 19 FALSE  TRUE  TRUE FALSE  TRUE
>> 20 FALSE  TRUE  TRUE FALSE FALSE
>> 21 FALSE  TRUE FALSE  TRUE  TRUE
>> 22 FALSE  TRUE FALSE  TRUE FALSE       3
>> 23 FALSE  TRUE FALSE FALSE  TRUE
>> 24 FALSE  TRUE FALSE FALSE FALSE   11,18
>> 25 FALSE FALSE  TRUE  TRUE  TRUE
>> 26 FALSE FALSE  TRUE  TRUE FALSE
>> 27 FALSE FALSE  TRUE FALSE  TRUE
>> 28 FALSE FALSE  TRUE FALSE FALSE 8,19,22
>> 29 FALSE FALSE FALSE  TRUE  TRUE
>> 30 FALSE FALSE FALSE  TRUE FALSE       7
>> 31 FALSE FALSE FALSE FALSE  TRUE 2,24,25
>> 32 FALSE FALSE FALSE FALSE FALSE
>>
>> H.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list