[BioC] Help using ENSMUSG ids in GOstats
James W. MacDonald
jmacdon at med.umich.edu
Mon May 12 16:44:39 CEST 2008
Perhaps this will help a bit.
Loading required package: AnnotationDbi
Loading required package: Biobase
Loading required package: tools
Welcome to Bioconductor
Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.
Loading required package: DBI
Loading required package: RSQLite
 "org.Mm.eg" "org.Mm.eg_dbconn" "org.Mm.eg_dbfile"
 "org.Mm.eg_dbInfo" "org.Mm.eg_dbschema" "org.Mm.egACCNUM"
 "org.Mm.egACCNUM2EG" "org.Mm.egALIAS2EG" "org.Mm.egCHR"
 "org.Mm.egCHRLENGTHS" "org.Mm.egCHRLOC" "org.Mm.egENSEMBL"
 "org.Mm.egENSEMBL2EG" "org.Mm.egENZYME" "org.Mm.egENZYME2EG"
 "org.Mm.egGENENAME" "org.Mm.egGO" "org.Mm.egGO2ALLEGS"
 "org.Mm.egGO2EG" "org.Mm.egMAP" "org.Mm.egMAP2EG"
 "org.Mm.egMAPCOUNTS" "org.Mm.egMGI" "org.Mm.egMGI2EG"
 "org.Mm.egORGANISM" "org.Mm.egPATH" "org.Mm.egPATH2EG"
 "org.Mm.egPFAM" "org.Mm.egPMID" "org.Mm.egPMID2EG"
 "org.Mm.egPROSITE" "org.Mm.egREFSEQ" "org.Mm.egREFSEQ2EG"
 "org.Mm.egSYMBOL" "org.Mm.egSYMBOL2EG" "org.Mm.egUNIGENE"
You will probably also need to make use of the revmap() function. If we
assume here that you have a character vector of Ensembl IDs called ENSMUSG:
gns <- mget(ENSMUSG, revmap(org.Mm.egENSEMBL))
will give you a list of Entrez Gene IDs. For GOstats you need to come up
with a character vector of unique Entrez Gene IDs, so you may need to
check for multiple Entrez Gene IDs for a particular Ensembl ID (no
guarantee that there is a one-to-one mapping), and then get rid of
duplicates (e.g., simply wrapping the above in unlist() is not likely
what you want to do).
The same holds true for the universe, which is the set of genes that
could have been selected from your chip. Once you have those things, the
procedure is quite straightforward. An example with fake data:
First just get some random IDs:
> gns <- unique(toTable(org.Mm.egENSEMBL)[1:100,1])
> univ <- unique(toTable(org.Mm.egENSEMBL)[1:1000,1])
Now do the analysis:
> param <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ,
ontology = "BP", annotation = "org.Mm.eg.db")
> hyp <- hyperGTest(param)
GOBPID Pvalue OddsRatio
GO:0007229 GO:0007229 9.168712e-11 107.987805
GO:0010033 GO:0010033 1.255989e-06 25.192157
GO:0042391 GO:0042391 6.797840e-06 9.590361
GO:0007166 GO:0007166 1.404809e-05 2.941145
GO:0007190 GO:0007190 5.915149e-05 45.738636
GO:0031279 GO:0031279 5.915149e-05 45.738636
ExpCount Count Size
GO:0007229 1.2413793 11 12
GO:0010033 1.1379310 8 11
GO:0042391 2.0689655 10 20
GO:0007166 15.9310345 32 154
GO:0007190 0.6206897 5 6
GO:0031279 0.6206897 5 6
GO:0007229 integrin-mediated signaling pathway
GO:0010033 response to organic substance
GO:0042391 regulation of membrane potential
GO:0007166 cell surface receptor linked signal transduction
GO:0007190 activation of adenylate cyclase activity
GO:0031279 regulation of cyclase activity
John Reid wrote:
> Robert Gentleman wrote:
>>>> I am also guessing you have not searched the email list archives
>>>> for any of the several previous discussions (that is a good place to
>>> I did search the email list archives. Nothing came up. Can you
>>> suggest a good search term?
>> GOstats seems like a good starting place. Again, you seem not to
>> want to say what you did search on, so I have no idea why nothing came
>> up. The question has been asked quite a few times.
> I did search on GOstats, that certainly didn't help me find an
> annotation package. All the GOstats documentation says is that I need an
> annotation package. It does not help the user determine how to find the
> correct one. I'm not saying it should, just that this information is not
> easy to find anywhere else either.
>> Given that you have mouse genes, then I think you might be able to
>> rule out most of the annotation packages. The BioC views let you
>> select an organism, which greatly reduces the set you would need to
>> look at.
>> I get to this place with about 3 clicks from the top of the BioC page.
>> And then since you don't have an array it seems unlikely that any of
>> the array specific packages would be what you want. I hope with a few
>> minutes work you would have ended up at org.Mm.eg.db, which you may be
>> able to adapt to your needs. You may need some other tool (such as
>> biomaRt) to map from what ever identifiers you are using to those in
>> the annotation package (or they might be there already, again you
>> haven't given us much of anything to work with).
> I don't understand why you keep saying I haven't given you much to work
> with. The question surely is: Are ENSMUSG identifiers mapped in an
> annotation package so that I can use them in GOstats? This seemed clear
> to me in the first list post. Perhaps I have misunderstood some of the
> issues but at the moment I don't see what. Maybe you could enlighten me?
> I did end up at org.Mm.eg.Db myself also in a few clicks but it
> certainly doesn't use Ensembl identifiers, its description clearly
> states Entrez genes. So like you say I have extra work to do to map the
> Thanks for the help,
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
James W. MacDonald, M.S.
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
Ann Arbor MI 48109
More information about the Bioconductor