[BioC] Error running makeTranscriptDbFromGFF in GenomicFeatures

Jon Bråte jon.brate at ibv.uio.no
Thu Sep 4 17:14:24 CEST 2014


Thanks Michael,

Yes you are right. Many of the transcripts come from multiple chromosomes (or scaffolds because this is a poorly assembled genome and that is probably why there is so much trans-splicing).

I think removing the trans-spliced genes removes too many genes so I will try to do this in another way.

Thank you,

Jon


On 4. sep. 2014, at 13:56, Michael Lawrence wrote:

I think the error messages are a pretty good clue to what's wrong here. The TxDb needs to know the "rank" (the order within the transcript) of each exon. It tries to infer this from the positions, but this obviously fails when exons within the same transcript fall on multiple chromosomes (trans-splicing). When parsing the GTF, there is some problem with the format. You could figure out the offending line(s) by cutting the file in half recursively until the error goes away.

If you want, you could put the files up on dropbox, and I'll take a look at them.

Michael



On Thu, Sep 4, 2014 at 3:23 AM, Jon Bråte <jon.brate at ibv.uio.no<mailto:jon.brate at ibv.uio.no>> wrote:
Hi list,

I am trying to create a TranscriptDb using GenomicFeatures, but I get an error message. I think there might be something wrong with my gff-file, but I am not sure. I also tried converting the gff-file to gtf, but also get an error.

My goal with this is to plot the number of exons per gene.

Code:

#GFF-file
> txdb = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and-expression-levels/cds.gb.gff3",
+ format = "gff")
extracting transcript information
Extracting gene IDs
extracting transcript information
Processing splicing information for gff3 file.
Deducing exon rank from relative coordinates provided
Warning message:
In .deduceExonRankings(exs, format = "gff") :
  Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid attribute for exonRankAttributeName
Error in unlist(mapply(.assignRankings, starts, strands)) :
  error in evaluating the argument 'x' in selecting a method for function 'unlist': Error in (function (starts, strands)  :
  Exon rank inference cannot accomodate trans-splicing.

#GTF-file
> txdbGTF = makeTranscriptDbFromGFF(file = "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and-expression-levels/cds.gb.gtf",
+ format = "gtf")
Error in .parse_attrCol(attrCol, file, colnames) :
  Some attributes do not conform to 'tag value' format


> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0   Biobase_2.24.0         GenomicRanges_1.16.3
[5] GenomeInfoDb_1.0.2     IRanges_1.22.10        BiocGenerics_0.10.0

loaded via a namespace (and not attached):
 [1] BBmisc_1.7              BSgenome_1.32.0         BatchJobs_1.3           BiocParallel_0.6.1
 [5] Biostrings_2.32.1       DBI_0.2-7               GenomicAlignments_1.0.5 RCurl_1.95-4.3
 [9] RSQLite_0.11.4          Rcpp_0.11.2             Rsamtools_1.16.1        XML_3.98-1.1
[13] XVector_0.4.0           biomaRt_2.20.0          bitops_1.0-6            brew_1.0-6
[17] checkmate_1.3           codetools_0.2-9         digest_0.6.4            fail_1.2
[21] foreach_1.4.2           iterators_1.0.7         rtracklayer_1.24.2      sendmailR_1.1-2
[25] stats4_3.1.0            stringr_0.6.2           tools_3.1.0             zlibbioc_1.10.0


----------------------------------------------------------------
Jon Bråte

Section for Genetics and Evolutionary Biology (EVOGENE)
Department of Biosciences
University of Oslo
P.B. 1066 Blindern
N-0316, Norway
Email: jon.brate at ibv.uio.no<mailto:jon.brate at ibv.uio.no><mailto:jon.brate at ibv.uio.no<mailto:jon.brate at ibv.uio.no>>
Phone: 922 44 582
Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http://mn.uio.no/ibv/english/people/aca/jonbra/index.html><http://mn.uio.no/ibv/english/people/aca/jonbra/index.html>





        [[alternative HTML version deleted]]


_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor




----------------------------------------------------------------
Jon Bråte

Section for Genetics and Evolutionary Biology (EVOGENE)
Department of Biosciences
University of Oslo
P.B. 1066 Blindern
N-0316, Norway
Email: jon.brate at ibv.uio.no<mailto:jon.brate at ibv.uio.no>
Phone: 922 44 582
Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<http://mn.uio.no/ibv/english/people/aca/jonbra/index.html>





	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list