[BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes

Sarah Pohl sarah.pohl at helmholtz-hzi.de
Thu Aug 22 11:12:51 CEST 2013


Cook, Malcolm <MEC at ...> writes:

> 
> FYI, bioperl includes bp_genbank2gff3.pl
> 
> which when run as
> 
> > bp_genbank2gff3.pl NC_011025.gbk
> 
> produces NC_011025.gbk.gff (attached)
> 
> which loaded without error with transcript:
> 
> > txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3",
dataSource="NCBI",
> species="Some bact")
> extracting transcript information
> Extracting gene IDs
> extracting transcript information
> Processing splicing information for gff3 file.
> Deducing exon rank from relative coordinates provided
> Prepare the 'metadata' data frame ... metadata: OK
> Now generating chrominfo from available sequence names. No chromosome
length information is available.
> Warning messages:
> 1: In .deduceExonRankings(exs, format = "gff") :
>   Infering Exon Rankings.  If this is not what you expected, then please
be sure that you have provided a valid
> attribute for exonRankAttributeName
> 2: In matchCircularity(chroms, circ_seqs) :
>   None of the strings in your circ_seqs argument match your seqnames.
> > txdb
> TranscriptDb object:
> | Db type: TranscriptDb
> | Supporting package: GenomicFeatures
> | Data source: NCBI
> | Genus and Species: Some bact
> | miRBase build ID: NA
> | transcript_nrow: 631
> | exon_nrow: 631
> | cds_nrow: 631
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013)
> | GenomicFeatures version at creation time: 1.10.2
> | RSQLite version at creation time: 0.11.2
> | DBSCHEMAVERSION: 1.0


Hey,

I know I'm a bit late for this discussion, but I have a similar problem.

I have a bacterial GBK file which I tried to convert using the
bp_genbank2gff3.pl script,
    perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/
but I got the following error:
   "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl
line 672, <FH> line 208948."
So instead I converted it with Biopython and the BCBio module, which worked
fine.
Only now, when I try to load it with makeTranscriptDbFromGFF,
    txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3",
dataSource="CDS", species="Pseudomonas aeruginosa")
I also get an error:
    Error in unique(tables[["transcripts"]][["tx_chrom"]]) : 
    'unique': Error: object 'tables' not found

Why does this happen and what can I do about it?



More information about the Bioconductor mailing list