[BioC] defining accessors (cols, keys, keytypes, select) for annotation package built with AnnotationForge package

Fri Aug 23 23:30:32 CEST 2013

On 08/20/2013 12:40 AM, Sashi wrote:
> Marc Carlson <mcarlson at ...> writes:
>
>> On 08/14/2013 04:58 AM, Sashi wrote:
>>> Marc Carlson <mcarlson <at> ...> writes:
>>>
>>>> Hi Sashi,
>>>>
>>>> The PDF from Gabor that you are looking at is much older and was from
>>>> before we even had the select method.  These days you probably don't
>>>> want to do that.  Especially if you want to implement a method like
>>>> select().  I strongly suspect that you really just want to be looking
> at
>>>> this vignette instead:
>>>>
>>>>
> http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationForge/
> inst/doc/MakingNewAnnotationPackages.pdf
>>>> To answer your questions, GO is actually looking at a view that is
>>>> created in the database of the three GO tables (one for BP, MF and CC).
>>>> But you probably don't need that level of detail.  If you are using
>>>> org.At.tair.db to look at arabidopsis, then you may already have
>>>> everything you need.  And if you need another organism, you probably
>>>> want to look 1st at making an org package using
>>>> makeOrgPackageFromNCBI().  And if for some reason you want to expose
>>>> some entirely new database resource (IOW you don't want to make an
>>>> organism package but something else entirely), then you might need to
>>>> use the vignette above.
>>>>
>>>> I hope this helps you,
>>>>
>>>>      Marc
>>>>
>>>> On 08/13/2013 04:33 AM, Rameswara Sashi Kiran Challa wrote:
>>>>> Hi ,
>>>>>
>>>>> I am trying to build an annotation organism package by using
> Annotation
>>>>> Forge package. I followed this
>>>>>
> document<http://www.bioconductor.org/packages/2.12/bioc/vignettes/Annotation
> Forge/inst/doc/NewSchema.pdf>written
>>>>> by Gabor Csardi.
>>>>> I was able to build a sqlite database and create an Annotation package
>>>>> using the makeAnnDbPkg() function.
>>>>>
>>>>> I understand cols(), keys(), keytypes() and select() are set as
> generic
>>>>> methods in AnnotationDbi.
>>>>>
>>>>> When I look into methods-AnnotationDb.R script in AnnotationDbi
> package, I
>>>>> see cols() method is set and it actually reads all the columns of all
> the
>>>>> tables in the sqlite table.
>>>>>
>>>>> When I run *cols() *on *org.At.tair.db  *I get few values which are
>>>>> actually not field/column names in the sqlite db. For Eg. there is no
> table
>>>>> called "GO" in org.At.tair.sqlite database. I am unable to understand
> how
>>>>> it creates these values. Could someone please help me understand how
> and
>>>>> where exactly these accessor functions are defined and how and where
> are
>>>>> they to be modified to be able to access the data in the sqlite db
> that I
>>>>> am creating for the organism I am working on.
>>>>>
>>>>> ==========================
>>>>>
>>>>>> cols(org.At.tair.db)
>>>>>     [1] "TAIR"         "CHRLOC"       "CHRLOCEND"    "ENZYME"
> "PATH"
>>>>>
>>>>> [6] "PMID"         "REFSEQ"       "SYMBOL"       "GENENAME"     "GO"
>>>>>
>>>>>
>>>>> [11] "EVIDENCE"     "ONTOLOGY"     "GOALL"        "EVIDENCEALL"
>>> "ONTOLOGYALL"
>>>>> [16] "ARACYC"       "ARACYCENZYME" "ENTREZID"     "CHR"
>>>>> =======================================
>>>>>
>>>>> Please point me to any documentation available for the same.
>>>>>
>>>>> Thanks for your time,
>>>>> Sashi
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor <at> ...
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor <at> ...
>>> Hi Marc,
>>>
>>> Thanks for your prompt reply. Referring to the document you pointed me
> to, I
>>> created another R script within the organism package skeleton( an R
> script
>>> apart from zzz.R) and set cols, keytypes accessor methods.
>>>
>>> As part of annotation packages Bimaps are created in every annotation
>>> package. How do we use these Bimaps in these accessor methods? Am I
> right in
>>> thinking that these Bimaps are to be used in these accessor methods? Or
>>> those Bimaps have to be accessed only via get(), mget(), toTable()
> methods?
>>> Also, can you please let me know if there is any documentation available
> on
>>> how the GO views are created? I see there are seperate tables like
> go_cc,
>>> go_mf, go_bp, etc under Arabidopsis annotation package. Is it necessary
> to
>>> have go_cc, go_mf, go_bp, go_mf_all, like tables in the sqlite database
> for
>>> the customized annotation package I am creating? Will not just a single
>>> table for all GO annotations suffice?
>>>
>>> Thanks again for your time,
>>> Sashi
>> Hi Sashi,
>>
>> I really doubt that you need to think about bimaps at all.  You don't
>> need them to implement select, cols, keytypes or keys.  And they are
>> really only still supported for the sake of older legacy code.  The get,
>> mget, and toTable methods are defined to help with bimaps, but you
>> probably don't need to use these methods anyways. So it's very unlikely
>> that you would even need to use bimaps let alone implement them.
>>
>> And the go view is just a SQLite database view.  A view is sort of like
>> a pre-canned database query that appears as a table.  Our "go view" is
>> really just the union of go_bp, go_mf, and go_cc tables. Those three
>> separate tables allow us to still keep the different terms (from the
>> different ontologies) as separate from each other in the database.  But
>> since we are using a view, we can also easily query all three of them
>> (as if they were lumped together) WITHOUT actually duplicating all that
>> data into another enormous table.  And the performance for this is still
>> great.
>>
>> You can read a bit about how SQLITE views are created here if you are
>> curious:
>>
>> http://www.sqlite.org/lang_createview.html
>>
>> But if you are making an org package, why not just use
>> makeOrgPackageFromNCBI?
>>
>>     Marc
>>
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at ...
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at ...
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> Thanks a lot Marc!!
>
> It's good to know that BioConductor community is trying to move away from
> Bimaps and adopt cols, select, keys, keytypes methods for any sort of
> queries. Are Reverse maps that are part of Bimaps taken care by these
> accesors?
>
> As I understand, for older legacy code perhaps the makeOrgPackageFromNCBI is
> also still generating Bimaps and in near future, perhaps all the Annotation
> packages will just have a sqlite database and these accessors, defined. Am I
> correct?
>
> I had started by looking at how to build a sqlite db with some of the
> mappings we have and had not used makeOrgPackageFromNCBI function. My
> thinking was that having an understanding of sqlite db building will enable
> me to add any new mappings that are not part of NCBI.
>
> So, to summarize, for Annotation package development one approach is using
> makeOrgPackageFromNCBI() and the other approach is to make a sqlite db and
> then define these accessors, as given in the pdf you had linked me to
> earlier. And there will be no need of any Bimaps for the package development
> as such.
>
> Thanks for your time,
> -Sashi
Hi Sashi,

We don't aim to get rid of bimaps from packages that have already had 
them before, but we definitely don't think that new packages need to 
have anything other than cols, keys, keytypes and select. Reverse maps 
are also unnecessary if you have defined those newer methods.

So yes any new stuff you are developing should really just focus on 
those four accessors.  The other stuff is just older stuff that you 
shouldn't ever need.  And if you do need it, when we need to change 
something so that you no longer need it!  We want to make it EASIER for 
people like you who are interested in exposing new resources.

Science is hard enough already.  ;)

   Marc

> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor