[BioC] Creating annotation package with a new database schema

Hervé Pagès hpages at fhcrc.org
Tue Oct 23 00:04:56 CEST 2012


Hi Fabian,

On 10/22/2012 05:57 AM, Fabian Grammes wrote:
> Dear List
>
> I am working with Atlantic salmon and am highly interested to make a
> custom annotation package for the
> microarray that I am using.
>
> I've worked through the tutorial from Gabor Csardi ("Creating an
> annotation package with a new database
> schema" ), which was very helpful. However, I am struggling to implement
> the bimap objects to access
> the GO annotations that I have in the DB.
>
> The GO data is stored in 6 tables (BP, BP_all, MF, MF_all, CC, CC_all)
> looking like the format that I found for
> the organism packages in BioC:
> ID        GOID            evi
> 6092    GO:0000910        IEA
> 6092    GO:0040035        IEA
> 6092    GO:0000398        IEA
>
> So if someone could help me/ point me to the correct way how to
> implement the GO mappings
> in an annotation package that would be great.

If you look for example at the hgu95av2.db package, it provides 3
predefined Bimaps for accessing the GO data: hgu95av2GO (GO map),
hgu95av2GO2PROBE (GO2PROBE map), and hgu95av2GO2ALLPROBES (GO2ALLPROBES
map). The 1st is a direct map, the 2nd and 3rd are reverse maps:

   > direction(hgu95av2GO)
   [1] 1
   > direction(hgu95av2GO2PROBE)
   [1] -1
   > direction(hgu95av2GO2ALLPROBES)
   [1] -1

All of them are of class "ProbeGo3AnnDbBimap".

The predefined Bimaps are created at load-time. The direct maps
with a call to AnnotationDbi:::createAnnDbBimaps() and the reverse
maps by "manually" reversing some of the direct maps returned by
createAnnDbBimaps().

So you need to add an entry for the GO map to the list of "seeds"
passed to createAnnDbBimaps(). In your case this entry needs to look
something like (assuming ID is your internal id for genes):

   seeds <- list(
     ...
     list(
         objName="GO",
         Class="ProbeGo3AnnDbBimap",
         L2Rchain=list(
             list(
                 tablename="probes",
                 Lcolname="probe_id",
                 Rcolname="gene_id",
                 filter="{is_multiple}='0'"
             ),
             list(
                 tablename="genes",
                 Lcolname="gene_id",
                 Rcolname="ID"
             ),
             list(
                 Lcolname="ID",
                 tagname=c(Evidence="{evi}"),
                 Rcolname="GOID",
                 Rattribnames=c(Ontology="NULL")
             )
         ),
         rightTables=c(BP="BP", CC="CC", MF="MF")
     )
     ...
   )

Then:

   ann_objs <- createAnnDbBimaps(seeds, seed0)

where 'seed0' is defined by something like:

   seed0 <- list(objTarget="chip <name_of_your_chip>",
                 datacache=datacache)

and 'datacache' is the environment that will be used for package-level
caching of the data loaded from the DB (use NULL for no caching, I'm
assuming those extra details, which are not GO-specific, are covered
in Gabor's document, but I don't know).

Then you can append the reverse maps to 'ann_objs' with something like:

   ## Append GO2PROBE map:
   map <- ann_objs$GO
   map <- revmap(map)
   map at objName <- "GO2PROBE"
   ann_objs$GO2PROBE <- map

   ## Append GO2ALLPROBES map:
   map <- ann_objs$GO2PROBE
   map at rightTables <- c(BP="BP_all", CC="CC_all", MF="MF_all")
   map at objName <- "GO2ALLPROBES"
   ann_objs$GO2ALLPROBES <- map

All this needs to happen at load-time (via the .onLoad hook). Again I'm
focusing on the GO-specific part of the story here, assuming that you've
already managed to create the non-GO specific maps (thanks to Gabor's
document).

Hope this helps,

H.

>
> kind regards, Fabian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list