[BioC] Gene Lists and Genomes

Marc Carlson mcarlson at fhcrc.org
Thu May 5 19:19:29 CEST 2011


Hi Radhouane,

As a starting point, you might want to read the vignette for the GOstats 
package called "Hypergeometric Tests Using GOstats":

http://www.bioconductor.org/packages/2.8/bioc/vignettes/GOstats/inst/doc/GOstatsHyperG.pdf

If you are starting with Ensembl gene IDs, you can easily convert those 
to entrez gene IDs, and then use that with GOstats to run a set of 
hypergeometric tests for each organism.  Once you get that done, 
interpreting the other comparisons you suggested sound a bit more murky 
because different organisms have been annotated differently depending on 
how well studied they are.


   Marc


On 05/05/2011 09:25 AM, Radhouane Aniba wrote:
> Thanks Tim,
>
> Actually I have a predicted list of miRNA binding sites targetting specific
> genes in 4 genomes.
>
> What I am trying to find is the characteristics of these Genes {g} for a
> specific genome {G} in term of GO enrichement and KEGG enrichment.
>
> My starting point is a file with a list of ENSEMBL IDs, just that, no
> annotation and no scores, just the gene names.
>
> I am looking for the right package to do that, topGO for example seems to
> not accept only gene names, annotation is needed as well as other details, I
> am acually reading about all these packages that make gene enrichment
> analyses.
>
> Rad
>
>
> 2011/5/5 Tim Triche, Jr.<tim.triche at gmail.com>
>
>> Hi Radhouane,
>>
>> You can get more specific answers if you ask more specific questions.  The
>> mathematical formulation of the test(s), and therefore the meaning of your
>> results, will depend directly on
>>
>> 1) the logic of the package you use to test for GO enrichment in a gene
>> list or lists
>> 2) the logic of the package you use to test for KEGG enrichment in a gene
>> list or lists
>>
>> A concise and useful description of the logical basis for hypergeometric
>> and binomial tests:
>>
>>
>> http://great.stanford.edu/help/index.php/Statistics#What_is_the_hypergeometric_test_formally.3F
>>
>> You mention GSEA simultaneously with GO/KEGG enrichment, thus perhaps it
>> would be best if you provide examples making your question more concrete, so
>> that others may benefit.  For example, the logic behind the similarly-named
>> GSA and GSEA procedures differs subtly.  On that note, you might find the
>> following two discussions helpful:
>>
>>
>> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#What_is_the_difference_between_GSEA_and_an_overlap_statistic_.28hypergeometric.29_analysis_tool.3F
>>
>> http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf
>>
>> If you haven't already, you will want to read the original GSEA paper in
>> PNAS for background.
>>
>> Best regards,
>>
>> --t
>>
>>
>> On Thu, May 5, 2011 at 8:20 AM, Radhouane Aniba<aradwen at gmail.com>  wrote:
>>
>>> Hello everyone,
>>>
>>> Well I am aware of some packages in Bioconductor that are useful for
>>> measuring the GO or KEGG gene enrichment in a given file for a given
>>> genome,
>>> GOstats, GSEA etc ..
>>>
>>> My question is : I am working with 4 differents genomes, I have gene lists
>>> for each of them, and I want for a given gene for each list for each
>>> geneome
>>> :
>>>
>>> - Extract the GO with its pvalue (what does the pvalue actually mean here,
>>> enrichment ok but how is it calculated ? )
>>> - Extract the KEGG pathway and its pvalue as well
>>>
>>> Thanks
>>>
>>> Rad
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>> --
>> If people do not believe that mathematics is simple, it is only because
>> they do not realize how complicated life is.
>> John von Neumann<http://www-groups.dcs.st-and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html>
>>
>>
>



More information about the Bioconductor mailing list