[BioC] Overlapping genes in subsets of lists

Sean Davis sdavis2 at mail.nih.gov
Wed Oct 8 14:59:36 CEST 2008


On Wed, Oct 8, 2008 at 8:34 AM, Heike Pospisil
<pospisil at zbh.uni-hamburg.de> wrote:
> Hello there,
>
> I have 100 lists of differentially expressed genes, and I am trying to find
> genes overrepresented in these 100 lists (I call them a 'cluster of genes').
> What's worse, I expect not only one cluster of genes, but three or four or
> five of them. That is why, a simple intersection() will not help. I wish to
> had a function that can select all genes which appear in 100% of 33 lists of
> genes (cluster 1), all genes which appear in 100% of 22 lists (cluster 2) and
> all genes which appear in 100% of the remaining 45 lists (cluster 3). (I hope
> my explanation is clear).
>
> Does anybody know a package or a strategy how to define such clusters?

Just a thought, but you could make a matrix with "gene lists" as the
columns (ie., gene list 1 in column 1, gene list 2 in column 2, etc.)
and rows with the union of all genes.  Put a "1" in each cell for a
gene that is present in a gene list and "0" elsewhere.  Once you have
this matrix, you can use normal clustering methods to look for
patterns.  For example, you could produce a heatmap of these data and
look for blocks.

Sean



More information about the Bioconductor mailing list