[BioC] Creating annotation package with a new database schema

Marc Carlson mcarlson at fhcrc.org
Fri Oct 26 01:16:22 CEST 2012


Hi Fabian,

If the data for GO is not available at NCBI, makeOrgPackageFromNCBI will 
try to use blast2GO instead (for GO at least).

   Marc


On 10/23/2012 01:20 PM, Fabian Grammes wrote:
> Hi Hervé
>
> Thanks a lot, that was exactly the information that I've been
> looking for !
>
> After updating BioConductor today, I am struggling a bit with
> getting the code to work again, but that should be fixed tomorrow
> I hope :)
>
> @ Marc
>
> I've checked the function: makeOrgPackageFromNCBI,
> however since I have most of my annotation information stored locally
> (GO etc. - obtained via Blast2GO) and not yet available at NCBI,
> I do not think the function helps in my case.
>
> cheers, F
>
> On Oct 23, 2012, at 12:04 AM, Hervé Pagès wrote:
>
>> Hi Fabian,
>>
>> On 10/22/2012 05:57 AM, Fabian Grammes wrote:
>>> Dear List
>>>
>>> I am working with Atlantic salmon and am highly interested to make a
>>> custom annotation package for the
>>> microarray that I am using.
>>>
>>> I've worked through the tutorial from Gabor Csardi ("Creating an
>>> annotation package with a new database
>>> schema" ), which was very helpful. However, I am struggling to 
>>> implement
>>> the bimap objects to access
>>> the GO annotations that I have in the DB.
>>>
>>> The GO data is stored in 6 tables (BP, BP_all, MF, MF_all, CC, CC_all)
>>> looking like the format that I found for
>>> the organism packages in BioC:
>>> ID        GOID            evi
>>> 6092    GO:0000910        IEA
>>> 6092    GO:0040035        IEA
>>> 6092    GO:0000398        IEA
>>>
>>> So if someone could help me/ point me to the correct way how to
>>> implement the GO mappings
>>> in an annotation package that would be great.
>>
>> If you look for example at the hgu95av2.db package, it provides 3
>> predefined Bimaps for accessing the GO data: hgu95av2GO (GO map),
>> hgu95av2GO2PROBE (GO2PROBE map), and hgu95av2GO2ALLPROBES (GO2ALLPROBES
>> map). The 1st is a direct map, the 2nd and 3rd are reverse maps:
>>
>> > direction(hgu95av2GO)
>>  [1] 1
>> > direction(hgu95av2GO2PROBE)
>>  [1] -1
>> > direction(hgu95av2GO2ALLPROBES)
>>  [1] -1
>>
>> All of them are of class "ProbeGo3AnnDbBimap".
>>
>> The predefined Bimaps are created at load-time. The direct maps
>> with a call to AnnotationDbi:::createAnnDbBimaps() and the reverse
>> maps by "manually" reversing some of the direct maps returned by
>> createAnnDbBimaps().
>>
>> So you need to add an entry for the GO map to the list of "seeds"
>> passed to createAnnDbBimaps(). In your case this entry needs to look
>> something like (assuming ID is your internal id for genes):
>>
>>  seeds <- list(
>>    ...
>>    list(
>>        objName="GO",
>>        Class="ProbeGo3AnnDbBimap",
>>        L2Rchain=list(
>>            list(
>>                tablename="probes",
>>                Lcolname="probe_id",
>>                Rcolname="gene_id",
>>                filter="{is_multiple}='0'"
>>            ),
>>            list(
>>                tablename="genes",
>>                Lcolname="gene_id",
>>                Rcolname="ID"
>>            ),
>>            list(
>>                Lcolname="ID",
>>                tagname=c(Evidence="{evi}"),
>>                Rcolname="GOID",
>>                Rattribnames=c(Ontology="NULL")
>>            )
>>        ),
>>        rightTables=c(BP="BP", CC="CC", MF="MF")
>>    )
>>    ...
>>  )
>>
>> Then:
>>
>>  ann_objs <- createAnnDbBimaps(seeds, seed0)
>>
>> where 'seed0' is defined by something like:
>>
>>  seed0 <- list(objTarget="chip <name_of_your_chip>",
>>                datacache=datacache)
>>
>> and 'datacache' is the environment that will be used for package-level
>> caching of the data loaded from the DB (use NULL for no caching, I'm
>> assuming those extra details, which are not GO-specific, are covered
>> in Gabor's document, but I don't know).
>>
>> Then you can append the reverse maps to 'ann_objs' with something like:
>>
>>  ## Append GO2PROBE map:
>>  map <- ann_objs$GO
>>  map <- revmap(map)
>>  map at objName <- "GO2PROBE"
>>  ann_objs$GO2PROBE <- map
>>
>>  ## Append GO2ALLPROBES map:
>>  map <- ann_objs$GO2PROBE
>>  map at rightTables <- c(BP="BP_all", CC="CC_all", MF="MF_all")
>>  map at objName <- "GO2ALLPROBES"
>>  ann_objs$GO2ALLPROBES <- map
>>
>> All this needs to happen at load-time (via the .onLoad hook). Again I'm
>> focusing on the GO-specific part of the story here, assuming that you've
>> already managed to create the non-GO specific maps (thanks to Gabor's
>> document).
>>
>> Hope this helps,
>>
>> H.
>>
>>>
>>> kind regards, Fabian
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> -- 
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list