[BioC] annotation and GO for non-model organism

Thu Sep 17 21:15:49 CEST 2009

Hi Ingunn,

First you should determine whether or not your organism is one of our
supported organisms.  Because you claim it is a non-model organism I
suspect it might not be, but it's still worth determining this first. 
If it is, then you should be able to get an organism level package from
our respository and use GOstats in a typical manner.  Determining if it
is should be straightforward for you.  You can simply call the
available.dbschemas() function in the AnnotationDbi package to determine
if your organism is supported by a schema.  If it is not, we have a new
workaround for you that will work with the latest versions of the
AnnoationDbi, GSEABase, GOstats and Category, packages which are
presently in our development branch. 

Since I suspect you will need the latter strategy, below is an example
of how you should be able to proceed.  It is very similiar to how you
would use the GOstats package traditionally, and you should probably
read the vignette for that package before attempting this for a more
detailed explanation.  Please note that in the following example
"frameData" is a data.frame object with 3 cols set to be GO IDs,
evidence codes and gene IDs respectively.  This is how you can introduce
the specific details from your organism.  Also, you will want to be
careful to ensure that your gene IDs should match the type of the IDs in
your 'universeGeneIds' and 'geneIds' and you should use a type of ID
that is truly unique (I recommend  something like entrez gene IDs).

library("GOstats")
library("GSEABase")
library("AnnotationDbi")
frame=GOFrame(frameData,organism="Homo sapiens")
allFrame=GOAllFrame(frame)
gsc <- GeneSetCollection(allFrame, setType = GOCollection())
params <- GSEAGOHyperGParams(name="My Custom GSEA based annot Params",
geneSetCollection=gsc, geneIds = genes, universeGeneIds = universe,
ontology = "MF", pvalueCutoff = 0.05, conditio
nal = FALSE, testDirection = "over")
Over <- hyperGTest(params)

Please let me know if you have questions or comments.  This is a new
capability, that we are adding so that we can provide better support for
non-model organisms.

  Marc

Ingunn Berget wrote:
> Dear List
>
> I want to do GO analysis on my microarray results, and have not done this before. We have a cDNA array for a non-model organism. The manufacturers of the array have provided annotations, so I have
> Accesision number, gene description, gene synonyms, EC, molecular_function, biological_process, cellular_component, InterPro, KEGG, Pfam, EMBL, Ensembl, UniGene, RefSeq, PROSITE, GeneId, org, 
> and more in a tab delimited txt file.
>
> so I suppose I have all the information I need, how can I use this with the bioconductor packages?
>
> I have looked at the vignette for SQLForge in the AnnotationDbi package as suggested on this list before, but as it says "At the present time, it is possible to make annotation packages for the
> most common model organisms" I don't know how to proceed.
>
> Best regards
> Ingunn
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>