[BioC] Getting annotation into GOstat
James W. MacDonald
jmacdon at uw.edu
Tue Oct 30 15:24:55 CET 2012
On 10/29/2012 9:01 PM, Naomi Altman wrote:
> I have used GOstat with annotations already formated for
> Bioconductor. Now my colleague sent me a file with the annotations -
> Gene Name,GO ID,GO Description
> GRMZM2G116557,6355,"regulation of transcription, DNA-dependent"
> What functions do I need to use to get this information into a
> database so that I can run GOstat?
You don't say what species this is, and that turns out to be critical.
Luckily my friend the googles says this is Zea mays, so I can give a
To do a hypergeometric test on these data you need at the very least an
org.Zm.eg.db package, and unfortunately there isn't one. Luckily Marc
Carlson is a stud, and you can now build one with about two lines of code:
> makeOrgPackageFromNCBI(version="1", author = "me <me at my.org>",
maintainer = "me <me at my.org>", outputDir = ".", tax_id = "4577", genus =
"Zea", species = "mays")
wait for a while, and then do the usual installation steps (R CMD build,
then check, then INSTALL).
Then you can use the GOstats package, but since you don't have a
chip-level .db package, you have to follow the instructions for using an
org level package. I don't recall offhand how that goes, but IIRC you
just have to specify the geneIDs and universeGeneIDs that pertain to
your experiment (and these will be Entrez Gene IDs, so you have to use
your new org.Zm.eg.db package to get them), as well as the org package.
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor