[BioC] defining accessors (cols, keys, keytypes, select) for annotation package built with AnnotationForge package

Wed Aug 14 20:52:20 CEST 2013

On 08/14/2013 04:58 AM, Sashi wrote:
> Marc Carlson <mcarlson at ...> writes:
>
>> Hi Sashi,
>>
>> The PDF from Gabor that you are looking at is much older and was from
>> before we even had the select method.  These days you probably don't
>> want to do that.  Especially if you want to implement a method like
>> select().  I strongly suspect that you really just want to be looking at
>> this vignette instead:
>>
>>
> http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/inst/doc/MakingNewAnnotationPackages.pdf
>> To answer your questions, GO is actually looking at a view that is
>> created in the database of the three GO tables (one for BP, MF and CC).
>> But you probably don't need that level of detail.  If you are using
>> org.At.tair.db to look at arabidopsis, then you may already have
>> everything you need.  And if you need another organism, you probably
>> want to look 1st at making an org package using
>> makeOrgPackageFromNCBI().  And if for some reason you want to expose
>> some entirely new database resource (IOW you don't want to make an
>> organism package but something else entirely), then you might need to
>> use the vignette above.
>>
>> I hope this helps you,
>>
>>     Marc
>>
>> On 08/13/2013 04:33 AM, Rameswara Sashi Kiran Challa wrote:
>>> Hi ,
>>>
>>> I am trying to build an annotation organism package by using Annotation
>>> Forge package. I followed this
>>>
> document<http://www.bioconductor.org/packages/2.12/bioc/vignettes/AnnotationForge/inst/doc/NewSchema.pdf>written
>>> by Gabor Csardi.
>>> I was able to build a sqlite database and create an Annotation package
>>> using the makeAnnDbPkg() function.
>>>
>>> I understand cols(), keys(), keytypes() and select() are set as generic
>>> methods in AnnotationDbi.
>>>
>>> When I look into methods-AnnotationDb.R script in AnnotationDbi package, I
>>> see cols() method is set and it actually reads all the columns of all the
>>> tables in the sqlite table.
>>>
>>> When I run *cols() *on *org.At.tair.db  *I get few values which are
>>> actually not field/column names in the sqlite db. For Eg. there is no table
>>> called "GO" in org.At.tair.sqlite database. I am unable to understand how
>>> it creates these values. Could someone please help me understand how and
>>> where exactly these accessor functions are defined and how and where are
>>> they to be modified to be able to access the data in the sqlite db that I
>>> am creating for the organism I am working on.
>>>
>>> ==========================
>>>
>>>> cols(org.At.tair.db)
>>>    [1] "TAIR"         "CHRLOC"       "CHRLOCEND"    "ENZYME"       "PATH"
>>>
>>>
>>> [6] "PMID"         "REFSEQ"       "SYMBOL"       "GENENAME"     "GO"
>>>
>>>
>>> [11] "EVIDENCE"     "ONTOLOGY"     "GOALL"        "EVIDENCEALL"
> "ONTOLOGYALL"
>>> [16] "ARACYC"       "ARACYCENZYME" "ENTREZID"     "CHR"
>>> =======================================
>>>
>>> Please point me to any documentation available for the same.
>>>
>>> Thanks for your time,
>>> Sashi
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at ...
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at ...
> Hi Marc,
>
> Thanks for your prompt reply. Referring to the document you pointed me to, I
> created another R script within the organism package skeleton( an R script
> apart from zzz.R) and set cols, keytypes accessor methods.
>
> As part of annotation packages Bimaps are created in every annotation
> package. How do we use these Bimaps in these accessor methods? Am I right in
> thinking that these Bimaps are to be used in these accessor methods? Or
> those Bimaps have to be accessed only via get(), mget(), toTable() methods?
>
> Also, can you please let me know if there is any documentation available on
> how the GO views are created? I see there are seperate tables like go_cc,
> go_mf, go_bp, etc under Arabidopsis annotation package. Is it necessary to
> have go_cc, go_mf, go_bp, go_mf_all, like tables in the sqlite database for
> the customized annotation package I am creating? Will not just a single
> table for all GO annotations suffice?
>
> Thanks again for your time,
> Sashi
Hi Sashi,

I really doubt that you need to think about bimaps at all.  You don't 
need them to implement select, cols, keytypes or keys.  And they are 
really only still supported for the sake of older legacy code.  The get, 
mget, and toTable methods are defined to help with bimaps, but you 
probably don't need to use these methods anyways. So it's very unlikely 
that you would even need to use bimaps let alone implement them.

And the go view is just a SQLite database view.  A view is sort of like 
a pre-canned database query that appears as a table.  Our "go view" is 
really just the union of go_bp, go_mf, and go_cc tables. Those three 
separate tables allow us to still keep the different terms (from the 
different ontologies) as separate from each other in the database.  But 
since we are using a view, we can also easily query all three of them 
(as if they were lumped together) WITHOUT actually duplicating all that 
data into another enormous table.  And the performance for this is still 
great.

You can read a bit about how SQLITE views are created here if you are 
curious:

http://www.sqlite.org/lang_createview.html

But if you are making an org package, why not just use 
makeOrgPackageFromNCBI?

   Marc

>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor