[BioC] GO term enrichment
Marc Carlson
mcarlson at fhcrc.org
Thu Jul 1 19:53:37 CEST 2010
Hi Assa,
It really sounds like you should look at the vignette titled
"Hypergeometric Tests Using GOstats" from using the GOstats package.
http://www.bioconductor.org/packages/release/bioc/html/GOstats.html
Marc
On 07/01/2010 02:58 AM, Assa Yeroslaviz wrote:
> Hello everybody,
>
> I have a table with some microarray experiments which look like that:
> my "genelist.txt"
> Probe Id NAME FC_set1 FC_set2 FC_set3 FC_set4
> A_51_P100021 Hivep3 1.048368 -1.085207 -1.013457 1.032816
> A_51_P100034 Mif4gd -1.049719 -1.077773 -1.084012 -1.004941
> A_51_P100052 Slitrk2 1.339832 1.063053 -1.157675 -1.003128
> A_51_P100063 Lnx1 1.073604 1.010892 -1.058375 1.063377
> A_51_P100084 Unknown 1.084544 -1.258876 -1.092571 -1.058791
> ...
>
> the Probe Ids are from the Agilent expression arrays. I extracted the names
> using BiomaRt and now I would like to find whether there are some
> overrepresented gene sets in the differentially regulated genes.
> For once I would like to see if there are any GO terms which are
> overrepresented in these gene lists for each of the columns (gene sets).
> Secondly i would like to search for accumulations of other gene sets of
> differentially regulated genes in these lists (for example kinases,
> transcription factors, but also localization, protein domain etc.)
>
> I would like your help in creating the gene sets of either GO terms or the
> other parameters.
>
> I know I can extract the data from BiomaRt to each and every gen. for
> example:
>
> mart <- useMart("ensembl")
> ensembl <- useDataset("mmusculus_gene_ensembl", mart = mart)
>
> test <- read.delim("genelist.txt")
> geneset1 <- read.delim("geneset1_all_signal.txt")
> genes <- as.character(geneset1[,1])
>
> geneNames <- getBM(attributes = c("go_biological_process_id", "name_1006",
> "agilent_wholegenome", "external_gene_id", "ensembl_gene_id", "entrezgene"),
> filter = c("agilent_wholegenome"), values = geneset1, mart = ensembl)
>
>
>> geneNames
>>
> go_biological_process_id
> name_1006
> 1
> GO:0007409
> axonogenesis
> 2 GO:0006511 ubiquitin-dependent
> protein catabolic process
> 3 GO:0051260
> protein homooligomerization
> 4 GO:0042787 protein ubiquitination during ubiquitin-dependent
> protein catabolic process
> 5 GO:0006417
> regulation of translation
> 6
> GO:0016070 RNA
> metabolic process
> 7
> GO:0016070 RNA
> metabolic process
> agilent_wholegenome external_gene_id ensembl_gene_id entrezgene
> 1 A_51_P100052 Slitrk2 ENSMUSG00000036790 245450
> 2 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
> 3 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
> 4 A_51_P100063 Lnx1 ENSMUSG00000029228 16924
> 5 A_51_P100034 Mif4gd ENSMUSG00000020743 69674
> 6 A_51_P100034 Mif4gd ENSMUSG00000020743 69674
> 7 A_51_P100034 Mif4gd ENSMUSG00000020743 NA
>
> Now I would like to create the gene sets according to these GO categories. I
> would like to get something like that:
>
> GO:0007409 A_51_P100052 ... the rest of the genes from this category in the
> list on one line
> GO:0016070 A_51_P100034 ...
> GO:0006417 A_51_P100034 ...
>
> THX for the help
>
> Assa
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list