[BioC] Getting annotation into GOstat

James W. MacDonald jmacdon at uw.edu
Tue Oct 30 15:24:55 CET 2012


Hi Naomi,

On 10/29/2012 9:01 PM, Naomi Altman wrote:
> Hi,
> I have used GOstat with annotations already formated for 
> Bioconductor.  Now my colleague sent me a file with the annotations - 
> e.g.
>
> Gene Name,GO ID,GO Description
>
> GRMZM5G802875,,
> GRMZM2G116557,6355,"regulation of transcription, DNA-dependent"
> GRMZM2G116557,5634,nucleus
>
> What functions do I need to use to get this information into a 
> database so that I can run GOstat?

You don't say what species this is, and that turns out to be critical. 
Luckily my friend the googles says this is Zea mays, so I can give a 
partial answer.

To do a hypergeometric test on these data you need at the very least an 
org.Zm.eg.db package, and unfortunately there isn't one. Luckily Marc 
Carlson is a stud, and you can now build one with about two lines of code:

 > library(AnnotationForge)
 > makeOrgPackageFromNCBI(version="1", author = "me <me at my.org>", 
maintainer = "me <me at my.org>", outputDir = ".", tax_id = "4577", genus = 
"Zea", species = "mays")

wait for a while, and then do the usual installation steps (R CMD build, 
then check, then INSTALL).

Then you can use the GOstats package, but since you don't have a 
chip-level .db package, you have to follow the instructions for using an 
org level package. I don't recall offhand how that goes, but IIRC you 
just have to specify the geneIDs and universeGeneIDs that pertain to 
your experiment (and these will be Entrez Gene IDs, so you have to use 
your new org.Zm.eg.db package to get them), as well as the org package.

Best,

Jim
>
> Thanks,
> Naomi
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list