[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF

Katja Hebestreit katjah at stanford.edu
Mon Apr 14 20:24:35 CEST 2014


You can download the file here:

https://www.dropbox.com/s/04nck83jq6r91bc/mm9_test.gtf

Using file I get the error:

txdb <- makeTranscriptDbFromGFF(file="Data/mm9_test.gtf", format="gtf")
Error in .parse_attrCol(attrCol, file, colnames) : 
  Some attributes do not conform to 'tag value' format

Thank you so much for helping!!
Katja


----- Original Message -----
From: "Michael Lawrence" <lawrence.michael at gene.com>
To: "Katja Hebestreit" <katjah at stanford.edu>
Cc: "Michael Lawrence" <lawrence.michael at gene.com>, bioconductor at r-project.org, "Rsamtools Maintainer" <maintainer at bioconductor.org>
Sent: Monday, April 14, 2014 7:27:26 AM
Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF

Well, I copied the text and replaced the spaces with tabs as appropriate
and everything seemed to work fine, so you might to attach that fragment of
the file, just to be sure it isn't a formatting issue.

Does import("file.gtf") work for you? If so, that should be good enough for
your use case.

Michael


On Sun, Apr 13, 2014 at 10:14 PM, Katja Hebestreit <katjah at stanford.edu>wrote:

> Actually, the error was not reproducible with the lines I attached. But it
> is reproducible with those lines (four additional lines):
>
> chr1    mm9_refFlat     stop_codon      3206103 3206105 0.000000        -
>       .       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3206106 3207049 0.000000        -       2
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     exon    3204563 3207049 0.000000        -       .
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3411783 3411982 0.000000        -       1
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     exon    3411783 3411982 0.000000        -       .
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3660633 3661429 0.000000        -       0
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     start_codon     3661427 3661429 0.000000        -
>       .       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     exon    3660633 3661579 0.000000        -       .
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     stop_codon      4283062 4283064 0.000000        -
>       .       gene_id "Rp1"; transcript_id "Rp1";
> chr1    mm9_refFlat     CDS     4283065 4283093 0.000000        -       2
>       gene_id "Rp1"; transcript_id "Rp1";
>
> Let me know if you like to get the entire file.
>
> Thank you!!
> Katja
>
> ----- Original Message -----
> From: "Michael Lawrence" <lawrence.michael at gene.com>
> To: "Katja Hebestreit" <katjah at stanford.edu>
> Cc: bioconductor at r-project.org, "Rsamtools Maintainer" <
> maintainer at bioconductor.org>
> Sent: Sunday, April 13, 2014 10:02:13 PM
> Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
>
> On Sun, Apr 13, 2014 at 7:18 PM, Katja Hebestreit <katjah at stanford.edu
> >wrote:
>
> > Hello,
> >
> > I get an error when I try to import my gff file:
> >
> > txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")
> >
> > Error in .parse_attrCol(attrCol, file, colnames) :
> >   Some attributes do not conform to 'tag value' format
> >
> > This is how my file looks like:
> >
> > chr1    mm9_refFlat     stop_codon      3206103 3206105 0.000000        -
> >       .       gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1    mm9_refFlat     CDS     3206106 3207049 0.000000        -       2
> >       gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1    mm9_refFlat     exon    3204563 3207049 0.000000        -       .
> >       gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1    mm9_refFlat     CDS     3411783 3411982 0.000000        -       1
> >       gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1    mm9_refFlat     exon    3411783 3411982 0.000000        -       .
> >       gene_id "Xkr4"; transcript_id "Xkr4";
> > chr1    mm9_refFlat     CDS     3660633 3661429 0.000000        -       0
> >       gene_id "Xkr4"; transcript_id "Xkr4";
> >
> > I have the feeling that this has something to do with the missing exon
> > rank information in my file. Is that true? Is there a way to import this
> > file? All I want to do is to determine the gene lengths.
> >
>
> It is most likely as the error says: some of your attributes are malformed.
> Is that the entire file listed above, or is there more? If you could get me
> the file somehow I could diagnose the issue.
>
>
> >
> > Could anyone help? That would be awesome!
> > Cheers,
> > Katja
> >
> >
> > sessionInfo()
> > R version 3.1.0 (2014-04-10)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
> >  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
> >  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> > [8] base
> >
> > other attached packages:
> > [1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19  Biobase_2.23.6
> > [4] GenomicRanges_1.16.0   GenomeInfoDb_0.99.32   IRanges_1.21.45
> > [7] BiocGenerics_0.9.3     BiocInstaller_1.14.1
> >
> > loaded via a namespace (and not attached):
> >  [1] BatchJobs_1.2           BBmisc_1.5              BiocParallel_0.6.0
> >  [4] biomaRt_2.20.0          Biostrings_2.32.0       bitops_1.0-6
> >  [7] brew_1.0-6              BSgenome_1.32.0         codetools_0.2-8
> > [10] DBI_0.2-7               digest_0.6.4            fail_1.2
> > [13] foreach_1.4.2           GenomicAlignments_1.0.0 iterators_1.0.7
> > [16] plyr_1.8.1              Rcpp_0.11.1             RCurl_1.95-4.1
> > [19] Rsamtools_1.16.0        RSQLite_0.11.4          rtracklayer_1.24.0
> > [22] sendmailR_1.1-2         stats4_3.1.0            stringr_0.6.2
> > [25] tools_3.1.0             XML_3.98-1.1            XVector_0.4.0
> > [28] zlibbioc_1.10.0
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>



More information about the Bioconductor mailing list