[BioC] GO term enrichment

Thu Jul 1 19:53:37 CEST 2010

Hi Assa,

It really sounds like you should look at the vignette titled
"Hypergeometric Tests Using GOstats" from using the GOstats package.

http://www.bioconductor.org/packages/release/bioc/html/GOstats.html

  Marc

On 07/01/2010 02:58 AM, Assa Yeroslaviz wrote:
> Hello everybody,
>
> I have a table with some microarray experiments which look like that:
> my "genelist.txt"
> Probe Id    NAME    FC_set1    FC_set2    FC_set3    FC_set4
> A_51_P100021    Hivep3    1.048368    -1.085207    -1.013457    1.032816
> A_51_P100034    Mif4gd   -1.049719    -1.077773    -1.084012    -1.004941
> A_51_P100052    Slitrk2   1.339832    1.063053    -1.157675    -1.003128
> A_51_P100063    Lnx1    1.073604    1.010892    -1.058375    1.063377
> A_51_P100084    Unknown   1.084544    -1.258876    -1.092571    -1.058791
> ...
>
> the Probe Ids are from the Agilent expression arrays. I extracted the names
> using BiomaRt and now I would like to find whether there are some
> overrepresented gene sets in the differentially regulated genes.
> For once I would like to see if there are any GO terms which are
> overrepresented in these gene lists for each of the columns (gene sets).
> Secondly i would like to search for accumulations of other gene sets of
> differentially regulated genes in these lists (for example kinases,
> transcription factors, but also localization, protein domain etc.)
>
> I would like your help in creating the gene sets of either GO terms or the
> other parameters.
>
> I know I can extract the data from BiomaRt to each and every gen. for
> example:
>
> mart <- useMart("ensembl")
> ensembl <- useDataset("mmusculus_gene_ensembl", mart = mart)
>
> test <- read.delim("genelist.txt")
> geneset1 <- read.delim("geneset1_all_signal.txt")
> genes <- as.character(geneset1[,1])
>
> geneNames <- getBM(attributes = c("go_biological_process_id", "name_1006",
> "agilent_wholegenome", "external_gene_id", "ensembl_gene_id", "entrezgene"),
> filter = c("agilent_wholegenome"), values = geneset1, mart = ensembl)
>
>   
>> geneNames
>>     
> go_biological_process_id
> name_1006
> 1
> GO:0007409
> axonogenesis
> 2               GO:0006511                               ubiquitin-dependent
> protein catabolic process
> 3               GO:0051260
> protein homooligomerization
> 4               GO:0042787 protein ubiquitination during ubiquitin-dependent
> protein catabolic process
> 5               GO:0006417
> regulation of translation
> 6
> GO:0016070                                                       RNA
> metabolic process
> 7
> GO:0016070                                                       RNA
> metabolic process
>   agilent_wholegenome external_gene_id    ensembl_gene_id entrezgene
> 1        A_51_P100052          Slitrk2 ENSMUSG00000036790     245450
> 2        A_51_P100063             Lnx1 ENSMUSG00000029228      16924
> 3        A_51_P100063             Lnx1 ENSMUSG00000029228      16924
> 4        A_51_P100063             Lnx1 ENSMUSG00000029228      16924
> 5        A_51_P100034           Mif4gd ENSMUSG00000020743      69674
> 6        A_51_P100034           Mif4gd ENSMUSG00000020743      69674
> 7        A_51_P100034           Mif4gd ENSMUSG00000020743         NA
>
> Now I would like to create the gene sets according to these GO categories. I
> would like to get something like that:
>
> GO:0007409 A_51_P100052 ... the rest of the genes from this category in the
> list on one line
> GO:0016070 A_51_P100034 ...
> GO:0006417 A_51_P100034 ...
>
> THX for the help
>
> Assa
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>