[BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes

Sarah Pohl Sarah.Pohl at helmholtz-hzi.de
Fri Aug 23 12:49:24 CEST 2013


Hey Marc,

I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time.
I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though.

I checked my gff3 file with http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online yesterday and according to them it is fine.

Here's the code:
library(VariantAnnotation)
library(GenomicFeatures)
library(BSgenome)
inf <- data.frame(cbind("NC_008463", 6537648, TRUE))
txdb <- makeTranscriptDbFromGFF(file="//CPI-SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf)

the error:
Prepare the 'metadata' data frame ... metadata: OK
Error in is.data.frame(arg) : object 'tables' not found

and the session info:
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] BSgenome_1.28.0         GenomicFeatures_1.12.3  AnnotationDbi_1.22.6
 [4] Biobase_2.20.1          VariantAnnotation_1.6.7 Rsamtools_1.12.3
 [7] Biostrings_2.28.0       GenomicRanges_1.12.4    IRanges_1.18.3
[10] BiocGenerics_0.6.0

loaded via a namespace (and not attached):
 [1] biomaRt_2.16.0     bitops_1.0-6       DBI_0.2-7          RCurl_1.95-4.1     RSQLite_0.11.4
 [6] rtracklayer_1.20.4 stats4_3.0.1       tools_3.0.1        XML_3.98-1.1       zlibbioc_1.6.0
Date: Thu, 22 Aug 2013 11:27:39 -0700
From: Marc Carlson <mcarlson at fhcrc.org>
To: bioconductor at r-project.org
Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria
        genomes
Message-ID: <5216581B.8090608 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed



On 08/22/2013 02:12 AM, Sarah Pohl wrote:
> Cook, Malcolm <MEC at ...> writes:
>
>> FYI, bioperl includes bp_genbank2gff3.pl
>>
>> which when run as
>>
>>> bp_genbank2gff3.pl NC_011025.gbk
>> produces NC_011025.gbk.gff (attached)
>>
>> which loaded without error with transcript:
>>
>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3",
> dataSource="NCBI",
>> species="Some bact")
>> extracting transcript information
>> Extracting gene IDs
>> extracting transcript information
>> Processing splicing information for gff3 file.
>> Deducing exon rank from relative coordinates provided
>> Prepare the 'metadata' data frame ... metadata: OK
>> Now generating chrominfo from available sequence names. No chromosome
> length information is available.
>> Warning messages:
>> 1: In .deduceExonRankings(exs, format = "gff") :
>>    Infering Exon Rankings.  If this is not what you expected, then please
> be sure that you have provided a valid
>> attribute for exonRankAttributeName
>> 2: In matchCircularity(chroms, circ_seqs) :
>>    None of the strings in your circ_seqs argument match your seqnames.
>>> txdb
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Supporting package: GenomicFeatures
>> | Data source: NCBI
>> | Genus and Species: Some bact
>> | miRBase build ID: NA
>> | transcript_nrow: 631
>> | exon_nrow: 631
>> | cds_nrow: 631
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013)
>> | GenomicFeatures version at creation time: 1.10.2
>> | RSQLite version at creation time: 0.11.2
>> | DBSCHEMAVERSION: 1.0
>
> Hey,
>
> I know I'm a bit late for this discussion, but I have a similar problem.
>
> I have a bacterial GBK file which I tried to convert using the
> bp_genbank2gff3.pl script,
>      perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/
> but I got the following error:
>     "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl
> line 672, <FH> line 208948."
> So instead I converted it with Biopython and the BCBio module, which worked
> fine.
> Only now, when I try to load it with makeTranscriptDbFromGFF,
>      txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3",
> dataSource="CDS", species="Pseudomonas aeruginosa")
> I also get an error:
>      Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
>      'unique': Error: object 'tables' not found
>
> Why does this happen and what can I do about it?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Hi Sarah,

It's hard to help you because it's pretty difficult to know what
actually happened after reading your post.  I can't be sure if the other
scripts you mention produced a valid gff3 file and I have no idea which
version of the software you are using.  Please see our posting guide here:

http://www.bioconductor.org/help/mailing-list/posting-guide/

But I will go out on a limb anyways and guess (based only the error code
in your message), that your problem might get better if you passed in a
value to the chrominfo argument.  You can see an example of how to use
that argument in the manual page by pulling the manual page up like this:

help(makeTranscriptDbFromGFF)

Hope this helps,


   Marc

________________________________

Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 Braunschweig | www.helmholtz-hzi.de
Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie"

Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, Bundesministerium für Bildung und Forschung
Stellvertreter: Rüdiger Eichel, Abteilungsleiter Niedersächsisches Ministerium für Wissenschaft und Kultur
Geschäftsführung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA
Gesellschaft mit beschränkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477


More information about the Bioconductor mailing list