[BioC] Create transcriptDb using gff3 files? - library GenomicFeatures and rtracklayer

Cook, Malcolm MEC at stowers.org
Thu Apr 5 17:01:30 CEST 2012


Supporting both Ensemble's GTF and GFF3 would be ideal.

Ensembl GTF would open up many genomes, including those in:
	ftp://ftp.ensembl.org/pub/release-66/gtf/
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-13/gtf/
	ftp://ftp.ensemblgenomes.org/pub/fungi/release-13/gtf/
	ftp://ftp.ensemblgenomes.org/pub/protists/release-13/gtf/
	ftp://ftp.ensemblgenomes.org/pub/plants/release-13/gtf/


Supporting Ensembl GTF would make it easy to distribute/archive the elements of a transcriptome analysis alongside a project/analysis in a generally useful format (i.e. IGV and other tools can work with it more or less directly)

Related note, I have learned that the BioMarts produced for EnsemblGenome's are NOT ARCHIVED, whereas it seems that historic GTF IS available.  Upshot: you'd best not depend upon being able to reproduce today's TranscriptDbFromBiomart  tomorrow.

re: "typical gff3 files"...
Flybase makes gff3 extracts and if my understanding is correct, have been diligent in "getting it right"

Also, NCBI historically has tried to provide GFFx extracts, with oodles of caveats.  
But, but, Last month they announced progress on improving their GFF3 offerings:  http://bio.perl.org/pipermail/bioperl-l/2012-March/036387.html
Example: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/
YMMV.

I too once hoped to find makeTranscriptDbFromGFF3 capability so as to allow easy tracking the head of Flybase's offerings, i.e. ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.44_FB2012_02/gff/ - alas I too have not followed up.

~Malcolm


> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-
> bounces at r-project.org] On Behalf Of Marc Carlson
> Sent: Wednesday, April 04, 2012 7:44 PM
> To: bioconductor at r-project.org
> Subject: Re: [BioC] Create transcriptDb using gff3 files? - library
> GenomicFeatures and rtracklayer
> 
> I was looking at this during the course, and this is on my TODO list for
> the next release cycle.  I think it is long overdue and I don't think
> that the community is going to get it done in spite of all the
> enthusiasm.  There has not been time to do it before now but I am hoping
> that will now change.  It should be simple enough in principle, but it
> might not be exactly trivial as I have discovered (on closer inspection)
> that the gff specification is not as concrete as one would like it to
> be.  Also there have been several different versions.
> 
> Some things that can help speed me along:
> 
> 1) which version is most important?  gff3?  Or one of the other
> versions?  It is likely that with the older versions we may not be able
> to extract as much meaningful information.
> 
>   2) where is the best place to find some typical gff3 files for
> examples?  This should not be difficult, but when I was looking before I
> was finding that people were surprisingly stingy about sharing these.
> 
> 
>    Marc
> 
> 
> 
> On 04/03/2012 03:57 PM, Michael Lawrence wrote:
> > Marc was working on this during the course in Feb. Not sure what
> happened
> > to it. He said it was simple. Maybe just waiting for the release to pass.
> >
> > Michael
> >
> > On Tue, Apr 3, 2012 at 3:40 PM, Steve Lianoglou<
> > mailinglist.honeypot at gmail.com>  wrote:
> >
> >> Hi,
> >>
> >> On Tue, Apr 3, 2012 at 4:41 PM, Sang Chul Choi<schoi at cornell.edu>
> wrote:
> >>> Hi,
> >>>
> >>> I am wondering if I could create a TranscriptDb object (library
> >> GenomicFeatures) using a gff3 file.  I could read a gff3 file using
> >> import.gff3, but I could not find a way to create TranscriptDb object from
> >> the object from import.gff3.
> >>> Two arguments for makeTranscriptDb are required: transcripts, splicings.
> >> It does not seem to be easy to parse this information from the object
> form
> >> import.gff3.  I will appreciate any help.
> >>
> >> As far as I know, this functionality isn't there yet ...
> >>
> >> I once (early feb, 2012) suggested I might take a crack at making this
> >> happen but haven't actually found the time to do it ... I'm not sure
> >> anyone in bioc-core land (hi, Marc) has found the time to do it
> >> either, so I think you're out of luck.
> >>
> >> Sorry for that. But the good news is that I bet a patch that does this
> >> would be welcome ;-)
> >>
> >> -steve
> >>
> >> --
> >> Steve Lianoglou
> >> Graduate Student: Computational Systems Biology
> >>   | Memorial Sloan-Kettering Cancer Center
> >>   | Weill Medical College of Cornell University
> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> > 	[[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list