[BioC] NCBI gff3 annotation file and read.gff()

Marc Carlson mcarlson at fhcrc.org
Wed Jul 16 19:58:29 CEST 2014


Yes you definitely can use makeTranscriptDbFromGFF if you want a 
TranscriptDb object.  The following works for example:

library("GenomicFeatures")
txdb <- makeTranscriptDbFromGFF(
file="ref_Macaca_fascicularis_5.0_top_level.gff3.gz",
                         format="gff3",
                         exonRankAttributeName=NA,
                         gffGeneIdAttributeName=NA,
                         chrominfo=NA,
                         dataSource=NA,
                         species=NA,
                         circ_seqs=DEFAULT_CIRC_SEQS,
                         miRBaseBuild=NA,
                         useGenesAsTranscripts=FALSE)

But is massaging this into a transcriptome what we want here?  Ugo 
hasn't told us what he wants to do with this data.  Also I didn't look 
closely at the data itself.  It may be that you can specify a value for 
exonRankAttributeName (which is always what you should want to do if you 
can manage it).


  Marc



On 07/16/2014 09:10 AM, Michael Lawrence wrote:
> Is there anything makeTranscriptDbFromGFF could do to help with this?
> Sounds like you typically want something like a TxDb, except perhaps with
> some special considerations. Following the NCBI conventions is probably
> worth it.
>
>
> On Wed, Jul 16, 2014 at 8:58 AM, Chris Stubben <stubben at lanl.gov> wrote:
>
>> I would also suggest using rtracklayer import or create a genome data
>> package.   At least for microbial genomes, you often just need to return
>> features (CDS, pseudogenes, tRNAs, etc) that have a parent with a locus_tag
>> key and assign that locus tag to the child (the read.gff default), so
>> that's what is getting messed up with your large file.
>> I'll probably use the rtracklayer import object in future versions instead
>> and then join on Parent where locus_tag is NA to the ID where locus_tag is
>> not NA.
>> Chris
>>
>>
>>
>>   I cc'd the packageMaintainer(), so that they are more likely to see this
>>> post.
>>>
>>   I don't know whether this helps in this particular case, but packages
>>> should be using rtracklayer::import rather than creating their own readers.
>>> Then at least whatever deficiencies are identified and corrected benefit
>>> the entire project.
>>>
>>
>>
>>
>> --
>>
>> Chris Stubben
>>
>> Los Alamos National Lab
>> Bioscience Division
>> MS M888
>> Los Alamos, NM 87545
>>
>> Phone: (505) 667-3295
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list