[BioC] Predicted overlaps of 3 gene lists

Edwin Groot edwin.groot at biologie.uni-freiburg.de
Thu Nov 3 11:48:08 CET 2011

Hello all,
I had an aha experience last night because of trying to count what
would be in Venn diagrams.
When calculating the expected overlaps of all pairwise combinations of
3 gene lists, I found out the the 3-way overlap is the combinatorial
extension of the 2-way overlap formula.

#Let Array Size = AS
#Let List 1 Size = L1S
#Let List 2 Size = L2S
#Let List 3 Size = L3S
#Expected overlap of list 1 with list 2 = EO12 = (L1S.L2S)/AS
#Expected overlap of list 1 with list 3 = EO13 = (L1S.L3S)/AS
#Expected overlap of list 2 with list 3 = EO23 = (L2S.L3S)/AS
#The expected 3-way overlap is similar to the expected 2-way overlaps;
#Expected 3-way overlap = EO123 = (L1S.L2S.L3S)/AS^2

#An example is the predicted overlaps for List 1 Size = 100, L2S=150,
L3S=200 and Array Size = 1000.
c.table <- c(3,27, 17,153,  12,108, 68,612)
dim(c.table) <- c(2,2,2)
dimnames(c.table) <- list(List1=c("in1","not1"), List2=c("in2","not2"),
List3=c("in3","not3"))
c.table
sum(c.table)
mantelhaen.test(c.table)
#p=1 exactly, as expected.

#The shortcut to getting the expected values of your observed
contingency table is the independence_table() of package vcd.
library(vcd)
ec.table <- independence_table(c.table, frequency="absolute")

To some this might seem trivial, but to the
non-statistician/mathematician/programmer this was an exciting
discovery.

This can be extended to hyperdimensional contingency tables, but I
shall wait for another sleepless night to test it.

Regards,
--
Dr. Edwin Groot, postdoctoral associate
AG Laux
Institut fuer Biologie III
Schaenzlestr. 1
79104 Freiburg, Deutschland
+49 761-2032948