[BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes

Marc Carlson mcarlson at fhcrc.org
Thu Aug 22 20:27:39 CEST 2013


On 08/22/2013 02:12 AM, Sarah Pohl wrote:
> Cook, Malcolm <MEC at ...> writes:
>
>> FYI, bioperl includes bp_genbank2gff3.pl
>>
>> which when run as
>>
>>> bp_genbank2gff3.pl NC_011025.gbk
>> produces NC_011025.gbk.gff (attached)
>>
>> which loaded without error with transcript:
>>
>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3",
> dataSource="NCBI",
>> species="Some bact")
>> extracting transcript information
>> Extracting gene IDs
>> extracting transcript information
>> Processing splicing information for gff3 file.
>> Deducing exon rank from relative coordinates provided
>> Prepare the 'metadata' data frame ... metadata: OK
>> Now generating chrominfo from available sequence names. No chromosome
> length information is available.
>> Warning messages:
>> 1: In .deduceExonRankings(exs, format = "gff") :
>>    Infering Exon Rankings.  If this is not what you expected, then please
> be sure that you have provided a valid
>> attribute for exonRankAttributeName
>> 2: In matchCircularity(chroms, circ_seqs) :
>>    None of the strings in your circ_seqs argument match your seqnames.
>>> txdb
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: NCBI
>> | Genus and Species: Some bact
>> | miRBase build ID: NA
>> | transcript_nrow: 631
>> | exon_nrow: 631
>> | cds_nrow: 631
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013)
>> | GenomicFeatures version at creation time: 1.10.2
>> | RSQLite version at creation time: 0.11.2
>> | DBSCHEMAVERSION: 1.0
>
> Hey,
>
> I know I'm a bit late for this discussion, but I have a similar problem.
>
> I have a bacterial GBK file which I tried to convert using the
> bp_genbank2gff3.pl script,
>      perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/
> but I got the following error:
>     "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl
> line 672, <FH> line 208948."
> So instead I converted it with Biopython and the BCBio module, which worked
> fine.
> Only now, when I try to load it with makeTranscriptDbFromGFF,
>      txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3",
> dataSource="CDS", species="Pseudomonas aeruginosa")
> I also get an error:
>      Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
>      'unique': Error: object 'tables' not found
>
> Why does this happen and what can I do about it?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Hi Sarah,

It's hard to help you because it's pretty difficult to know what 
actually happened after reading your post.  I can't be sure if the other 
scripts you mention produced a valid gff3 file and I have no idea which 
version of the software you are using.  Please see our posting guide here:

http://www.bioconductor.org/help/mailing-list/posting-guide/

But I will go out on a limb anyways and guess (based only the error code 
in your message), that your problem might get better if you passed in a 
value to the chrominfo argument.  You can see an example of how to use 
that argument in the manual page by pulling the manual page up like this:

help(makeTranscriptDbFromGFF)

Hope this helps,


   Marc



More information about the Bioconductor mailing list