[BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria genomes

Marc Carlson mcarlson at fhcrc.org
Fri Aug 23 19:23:49 CEST 2013


Thank you Sarah,

That is much better.  Is this the file you were parsing here?

ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Pseudomonas_aeruginosa_UCBPP_PA14_uid57977/NC_008463.gff


  Marc



On 08/23/2013 03:49 AM, Sarah Pohl wrote:
> Hey Marc,
>
> I'm sorry, I came here via gmane.org and didn't see the posting guide. I'll attach the relevant information this time.
> I tried with the chrominfo argument, and in a sense it works. At least there's no error about the missing chromosome size now. The main error stays the same, though.
>
> I checked my gff3 file with http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online yesterday and according to them it is fine.
>
> Here's the code:
> library(VariantAnnotation)
> library(GenomicFeatures)
> library(BSgenome)
> inf <- data.frame(cbind("NC_008463", 6537648, TRUE))
> txdb <- makeTranscriptDbFromGFF(file="//CPI-SL64001/spo12/BSgenome/annotation/NC_008463.gff", format="gff3", dataSource="CDS", species="Pseudomonas aeruginosa", chrominfo=inf)
>
> the error:
> Prepare the 'metadata' data frame ... metadata: OK
> Error in is.data.frame(arg) : object 'tables' not found
>
> and the session info:
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
> [4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] BSgenome_1.28.0         GenomicFeatures_1.12.3  AnnotationDbi_1.22.6
>   [4] Biobase_2.20.1          VariantAnnotation_1.6.7 Rsamtools_1.12.3
>   [7] Biostrings_2.28.0       GenomicRanges_1.12.4    IRanges_1.18.3
> [10] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.16.0     bitops_1.0-6       DBI_0.2-7          RCurl_1.95-4.1     RSQLite_0.11.4
>   [6] rtracklayer_1.20.4 stats4_3.0.1       tools_3.0.1        XML_3.98-1.1       zlibbioc_1.6.0
> Date: Thu, 22 Aug 2013 11:27:39 -0700
> From: Marc Carlson <mcarlson at fhcrc.org>
> To: bioconductor at r-project.org
> Subject: Re: [BioC] makeTranscriptDbFromGFF fails on NCBI Bacteria
>          genomes
> Message-ID: <5216581B.8090608 at fhcrc.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
>
> On 08/22/2013 02:12 AM, Sarah Pohl wrote:
>> Cook, Malcolm <MEC at ...> writes:
>>
>>> FYI, bioperl includes bp_genbank2gff3.pl
>>>
>>> which when run as
>>>
>>>> bp_genbank2gff3.pl NC_011025.gbk
>>> produces NC_011025.gbk.gff (attached)
>>>
>>> which loaded without error with transcript:
>>>
>>>> txdb <- makeTranscriptDbFromGFF(file="NC_011025.gbk.gff", format="gff3",
>> dataSource="NCBI",
>>> species="Some bact")
>>> extracting transcript information
>>> Extracting gene IDs
>>> extracting transcript information
>>> Processing splicing information for gff3 file.
>>> Deducing exon rank from relative coordinates provided
>>> Prepare the 'metadata' data frame ... metadata: OK
>>> Now generating chrominfo from available sequence names. No chromosome
>> length information is available.
>>> Warning messages:
>>> 1: In .deduceExonRankings(exs, format = "gff") :
>>>     Infering Exon Rankings.  If this is not what you expected, then please
>> be sure that you have provided a valid
>>> attribute for exonRankAttributeName
>>> 2: In matchCircularity(chroms, circ_seqs) :
>>>     None of the strings in your circ_seqs argument match your seqnames.
>>>> txdb
>>> TranscriptDb object:
>>> | Db type: TranscriptDb
>>> | Supporting package: GenomicFeatures
>>> | Data source: NCBI
>>> | Genus and Species: Some bact
>>> | miRBase build ID: NA
>>> | transcript_nrow: 631
>>> | exon_nrow: 631
>>> | cds_nrow: 631
>>> | Db created by: GenomicFeatures package from Bioconductor
>>> | Creation time: 2013-06-07 14:52:50 -0500 (Fri, 07 Jun 2013)
>>> | GenomicFeatures version at creation time: 1.10.2
>>> | RSQLite version at creation time: 0.11.2
>>> | DBSCHEMAVERSION: 1.0
>> Hey,
>>
>> I know I'm a bit late for this discussion, but I have a similar problem.
>>
>> I have a bacterial GBK file which I tried to convert using the
>> bp_genbank2gff3.pl script,
>>       perl bp_genbank2gff3.pl annotation/NC_008463.gbk -o annotation/
>> but I got the following error:
>>      "Can't call method "binomial" on an undefined value at bp_genbank2gff3.pl
>> line 672, <FH> line 208948."
>> So instead I converted it with Biopython and the BCBio module, which worked
>> fine.
>> Only now, when I try to load it with makeTranscriptDbFromGFF,
>>       txdb <- makeTranscriptDbFromGFF(file="NC_008463.gff", format="gff3",
>> dataSource="CDS", species="Pseudomonas aeruginosa")
>> I also get an error:
>>       Error in unique(tables[["transcripts"]][["tx_chrom"]]) :
>>       'unique': Error: object 'tables' not found
>>
>> Why does this happen and what can I do about it?
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> Hi Sarah,
>
> It's hard to help you because it's pretty difficult to know what
> actually happened after reading your post.  I can't be sure if the other
> scripts you mention produced a valid gff3 file and I have no idea which
> version of the software you are using.  Please see our posting guide here:
>
> http://www.bioconductor.org/help/mailing-list/posting-guide/
>
> But I will go out on a limb anyways and guess (based only the error code
> in your message), that your problem might get better if you passed in a
> value to the chrominfo argument.  You can see an example of how to use
> that argument in the manual page by pulling the manual page up like this:
>
> help(makeTranscriptDbFromGFF)
>
> Hope this helps,
>
>
>     Marc
>
> ________________________________
>
> Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 Braunschweig | www.helmholtz-hzi.de
> Das HZI ist seit 2007 zertifiziertes Mitglied im "audit berufundfamilie"
>
> Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, Bundesministerium für Bildung und Forschung
> Stellvertreter: Rüdiger Eichel, Abteilungsleiter Niedersächsisches Ministerium für Wissenschaft und Kultur
> Geschäftsführung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA
> Gesellschaft mit beschränkter Haftung (GmbH)
> Sitz der Gesellschaft: Braunschweig
> Handelsregister: Amtsgericht Braunschweig, HRB 477
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list