[BioC] Creating an OrganismDbi package with a few transcript annotations

Marc Carlson mcarlson at fhcrc.org
Fri May 24 02:51:30 CEST 2013


Hi Michael,

As usual, Martin is on the right track.  The new Schema .pdf stuff is 
only if you really need the old school bimaps, and bimaps are not 
actually needed for any of the OrganismDbi stuff.  So the interface of 
keytypes, cols, keys and select really ought to be enough to allow 
integration into OrganismDbi...

And, if you also followed the same org package DB schema that we use 
everywhere else that would be ideal since in that case, you could just 
recycle the methods we have already defined for OrgDb objects...  So in 
order to do that, a biomart equivalent to 
AnntotationForge:::makeOrgDbFromNCBI() and 
AnntotationForge:::makeOrgPackageFromNCBI would be a nice addition.

And I agree that a simple data.frame() based underlying implementation 
would make this easier to generalize.  Right now things are a bit 
specialized for NCBI resources.


   Marc



On 05/17/2013 09:56 PM, Michael Lawrence wrote:
> Cool, thanks Martin. I'll wait for Marc to get back. If what you say is
> correct, it would be nice to have a simple data frame implementation. I'm
> getting the annotations from a biomart, so a biomart implementation would
> be ideal, although that might be tricky semantically.
>
> Michael
>
>
>
>
> On Fri, May 17, 2013 at 5:43 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>
>> On 05/16/2013 01:50 PM, Michael Lawrence wrote:
>>
>>> Hi,
>>>
>>> I'd like to create an OrganismDbi package so that I can put extra
>>> annotations on the transcripts/genes in a TxDb package. My understanding
>>> is
>>> that I need a separate database package that I can join with the TxDb
>>> package. Do I need to make an OrgDb package? I looked into this a bit and
>>> it seems that there is little support for making a non-NCBI-based org
>>> package. Maybe I could create a new type of package with a simple table
>>> with a row for each transcript, including the gene symbol and whether the
>>> transcript is "canonical" according to UCSC. It looks like this process is
>>> documented here:
>>> http://www.bioconductor.org/**packages/2.12/bioc/vignettes/**
>>> AnnotationForge/inst/doc/**NewSchema.pdf<http://www.bioconductor.org/packages/2.12/bioc/vignettes/AnnotationForge/inst/doc/NewSchema.pdf>
>>> .
>>> It also seems really involved. What's the path of least resistance here?
>>>
>> Hi Michael -- Marc is away for a few days. I *think* the idea is that the
>> details in NewSchema are no longer required, rather, implement your extra
>> data in any fashion to provide a 'select' interface, i.e.,
>>
>>    keytypes
>>    keys
>>    cols
>>    select
>>
>> following the implied API of ?keytypes. Then create an OrgDb package with
>>
>>    AnnotationDbi::**makeOrganismPackage
>>
>> Sorry not to be more definitive in my help.
>>
>> Martin
>>
>>
>>> Thanks,
>>> Michael
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list