[BioC] GenomicFeatures: makeGeneDbFromBiomart()

Hooiveld, Guido Guido.Hooiveld at wur.nl
Thu Mar 3 14:34:17 CET 2011


Hi Marc,

Thank you for your suggestion. However, the combination of makeTranscriptDb + org.xx.eg.db packages won't work in all cases.
As you likely will know, a substantial part of our array analyses is performed with models that are not- or less-well studied in biomedical research, e.g. pig or a variety of plants (medicago, tomato). As a consequence, the annotation efforts are much less well thorough and standardized compared to e.g. human, mouse or rat, and in turn the BioC annotation infrastructure for these less-standard species is (understandably) less well developped. 
Taking pig as an example, although an org.db package is available (org.Ss.eg.db; build Sept 2010), this doesn't (yet?) contain Ensembl-based gene information. Moreover, until very recently (end of Dec 2010) it was Ensembl that had considerably more gene annotation info on the pig genome available than NCBI. I was hoping that by having such makeGeneDbFromBiomart() function available it could save me the hassle of always going through the process of manually querying the biomart website, because a BioC-compliant, Ensembl gene-centered database could be created (and saved!).

For plants basically the situation is even 'worse', by this i mean that in the case there is annotation info available, it is often limited and in such a format it is impossible for me to easily access it in BioC. I noticed the low level function makeTranscriptDb is able to create a db object from text files, hence ideal for my purpose, except that is transcript-centered. Often only gene-centered annotation info is available for plants, and then I expect I run into problems since e.g. info on splicing (required for dataframe 'splicings') is lacking.

I hope you got the reasoning for my question.

Regards,
Guido

-----Original Message-----
From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Marc Carlson
Sent: Thursday, March 03, 2011 01:16
To: bioconductor at r-project.org
Subject: Re: [BioC] GenomicFeatures: makeGeneDbFromBiomart()

Hi Guido,

If you just want gene information, then the
makeTranscriptDbFromBiomart() function should already give you gene IDs affiliated with the transcripts along with grouping information in convenient GRangesList objects.  Yes, this database is focused on the transcripts and their components, but it is not meant to be isolated from proper gene IDs.

And if you want to then link that information to more classic gene-centric annotations then you might want to look at something like the org.Hs.eg.db package (which includes IDs for ensembl IDs).

Using these two resources together, our hope was that it should be possible to do a large number of meaningful things.  So what specifically was it that you needed to do?


  Marc


On 03/01/2011 02:56 AM, Hooiveld, Guido wrote:
> I noticed that the library GenomicFeatures provides a set of very powerful functions to create databases with transcript-centered annotations from e.g. the BioMart database (makeTranscriptDbFromBiomart).
> I was wondering whether a function could be added that will allow the build of a gene-centered annotation database? E.g: 'makeGeneDbFromBiomart()' and/or 'makeGeneDb'.
> I am asking because I would like to easily retrieve AND store the annotation info of all Ensembl mouse genes. I already had a look at the source code to see whether I could modify some parts of the code myselves to create such function, but to me the code is too complicated to feel comfortable adapting it, but i have the *feeling* that this is rather straight-forward for the more knowledgable R-gurus, hence my question.
>
> Thanks in advance for considering,
> Guido
>
> ------------------------------------------------
> Guido Hooiveld, PhD
> Nutrition, Metabolism & Genomics Group Division of Human Nutrition 
> Wageningen University Biotechnion, Bomenweg 2
> NL-6703 HD Wageningen
> the Netherlands
> tel: (+)31 317 485788
> fax: (+)31 317 483342
> email:      guido.hooiveld at wur.nl<mailto:guido.hooiveld at wur.nl>
> internet:   http://nutrigene.4t.com<http://nutrigene.4t.com/>
> http://www.researcherid.com/rid/F-4912-2010
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list