[BioC] Error running makeTranscriptDbFromGFF in GenomicFeatures

Michael Lawrence lawrence.michael at gene.com
Thu Sep 4 18:40:12 CEST 2014


I would recommend calling

gr <- import(gff)

And then subset for the type being exon and tabulate by parent.

Michael



On Thu, Sep 4, 2014 at 8:14 AM, Jon Bråte <jon.brate at ibv.uio.no> wrote:

>  Thanks Michael,
>
>  Yes you are right. Many of the transcripts come from multiple
> chromosomes (or scaffolds because this is a poorly assembled genome and
> that is probably why there is so much trans-splicing).
>
>  I think removing the trans-spliced genes removes too many genes so I
> will try to do this in another way.
>
>  Thank you,
>
>  Jon
>
>
>   On 4. sep. 2014, at 13:56, Michael Lawrence wrote:
>
>   I think the error messages are a pretty good clue to what's wrong here.
> The TxDb needs to know the "rank" (the order within the transcript) of each
> exon. It tries to infer this from the positions, but this obviously fails
> when exons within the same transcript fall on multiple chromosomes
> (trans-splicing). When parsing the GTF, there is some problem with the
> format. You could figure out the offending line(s) by cutting the file in
> half recursively until the error goes away.
>
>  If you want, you could put the files up on dropbox, and I'll take a look
> at them.
>
>  Michael
>
>
>
> On Thu, Sep 4, 2014 at 3:23 AM, Jon Bråte <jon.brate at ibv.uio.no> wrote:
>
>> Hi list,
>>
>> I am trying to create a TranscriptDb using GenomicFeatures, but I get an
>> error message. I think there might be something wrong with my gff-file, but
>> I am not sure. I also tried converting the gff-file to gtf, but also get an
>> error.
>>
>> My goal with this is to plot the number of exons per gene.
>>
>> Code:
>>
>> #GFF-file
>> > txdb = makeTranscriptDbFromGFF(file =
>> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and-expression-levels/cds.gb.gff3",
>> + format = "gff")
>> extracting transcript information
>> Extracting gene IDs
>> extracting transcript information
>> Processing splicing information for gff3 file.
>> Deducing exon rank from relative coordinates provided
>> Warning message:
>> In .deduceExonRankings(exs, format = "gff") :
>>   Infering Exon Rankings.  If this is not what you expected, then please
>> be sure that you have provided a valid attribute for exonRankAttributeName
>> Error in unlist(mapply(.assignRankings, starts, strands)) :
>>   error in evaluating the argument 'x' in selecting a method for function
>> 'unlist': Error in (function (starts, strands)  :
>>   Exon rank inference cannot accomodate trans-splicing.
>>
>> #GTF-file
>> > txdbGTF = makeTranscriptDbFromGFF(file =
>> "~/Documents/Prosjekter/RNA-project/Data/Sycon_ciliatum/sycon-from-Bergen/gff-files-and-expression-levels/cds.gb.gtf",
>> + format = "gtf")
>> Error in .parse_attrCol(attrCol, file, colnames) :
>>   Some attributes do not conform to 'tag value' format
>>
>>
>> > sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>  base
>>
>> other attached packages:
>> [1] GenomicFeatures_1.16.2 AnnotationDbi_1.26.0   Biobase_2.24.0
>>  GenomicRanges_1.16.3
>> [5] GenomeInfoDb_1.0.2     IRanges_1.22.10        BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>>  [1] BBmisc_1.7              BSgenome_1.32.0         BatchJobs_1.3
>>    BiocParallel_0.6.1
>>  [5] Biostrings_2.32.1       DBI_0.2-7
>>  GenomicAlignments_1.0.5 RCurl_1.95-4.3
>>  [9] RSQLite_0.11.4          Rcpp_0.11.2             Rsamtools_1.16.1
>>     XML_3.98-1.1
>> [13] XVector_0.4.0           biomaRt_2.20.0          bitops_1.0-6
>>     brew_1.0-6
>> [17] checkmate_1.3           codetools_0.2-9         digest_0.6.4
>>     fail_1.2
>> [21] foreach_1.4.2           iterators_1.0.7         rtracklayer_1.24.2
>>     sendmailR_1.1-2
>> [25] stats4_3.1.0            stringr_0.6.2           tools_3.1.0
>>    zlibbioc_1.10.0
>>
>>
>> ----------------------------------------------------------------
>> Jon Bråte
>>
>> Section for Genetics and Evolutionary Biology (EVOGENE)
>> Department of Biosciences
>> University of Oslo
>> P.B. 1066 Blindern
>> N-0316, Norway
>> Email: jon.brate at ibv.uio.no<mailto:jon.brate at ibv.uio.no>
>> Phone: 922 44 582
>> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html<
>> http://mn.uio.no/ibv/english/people/aca/jonbra/index.html>
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
>
>  ----------------------------------------------------------------
> Jon Bråte
>
>           Section for Genetics and Evolutionary Biology (EVOGENE)
> Department of Biosciences
> University of Oslo
> P.B. 1066 Blindern
> N-0316, Norway
> Email: jon.brate at ibv.uio.no
> Phone: 922 44 582
> Web: mn.uio.no/ibv/english/people/aca/jonbra/index.html
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list