[BioC] Overlapping genes in subsets of lists

Wed Oct 8 15:23:52 CEST 2008

I would use the table function in R, which will tell you how many
times gene X appears. If you have 100 lists, the maximum frequency
is 100, as long as you make each gene unique on any given list.

Then you can sort by frequency to see which genes come up most often.

Another approach I have used is to hierarchically cluster the the lists,
which will tell you which gene lists have the most genes in common.

Hope this helps,

Tom

On Oct 8, 2008, at 8:34 AM, Heike Pospisil wrote:

> Hello there,
>
> I have 100 lists of differentially expressed genes, and I am trying  
> to find
> genes overrepresented in these 100 lists (I call them a 'cluster of  
> genes').
> What's worse, I expect not only one cluster of genes, but three or  
> four or
> five of them. That is why, a simple intersection() will not help. I  
> wish to
> had a function that can select all genes which appear in 100% of 33  
> lists of
> genes (cluster 1), all genes which appear in 100% of 22 lists  
> (cluster 2) and
> all genes which appear in 100% of the remaining 45 lists (cluster  
> 3). (I hope
> my explanation is clear).
>
> Does anybody know a package or a strategy how to define such clusters?
>
> Thanks and best,
> Heike
> -- 
> Dr. Heike Pospisil      | pospisil at zbh.uni-hamburg.de
> University of Hamburg   | Center for Bioinformatics
> Bundesstrasse 43        | 20146 Hamburg, Germany
> phone:+49-40-42838-7303 | fax: +49-40-42838-7312
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor