[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF

Katja Hebestreit katjah at stanford.edu
Mon Apr 14 07:14:49 CEST 2014


Actually, the error was not reproducible with the lines I attached. But it is reproducible with those lines (four additional lines):

chr1	mm9_refFlat	stop_codon	3206103	3206105	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3206106	3207049	0.000000	-	2	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	exon	3204563	3207049	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3411783	3411982	0.000000	-	1	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	exon	3411783	3411982	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	CDS	3660633	3661429	0.000000	-	0	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	start_codon	3661427	3661429	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	exon	3660633	3661579	0.000000	-	.	gene_id "Xkr4"; transcript_id "Xkr4"; 
chr1	mm9_refFlat	stop_codon	4283062	4283064	0.000000	-	.	gene_id "Rp1"; transcript_id "Rp1"; 
chr1	mm9_refFlat	CDS	4283065	4283093	0.000000	-	2	gene_id "Rp1"; transcript_id "Rp1"; 

Let me know if you like to get the entire file.

Thank you!!
Katja

----- Original Message -----
From: "Michael Lawrence" <lawrence.michael at gene.com>
To: "Katja Hebestreit" <katjah at stanford.edu>
Cc: bioconductor at r-project.org, "Rsamtools Maintainer" <maintainer at bioconductor.org>
Sent: Sunday, April 13, 2014 10:02:13 PM
Subject: Re: [BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF

On Sun, Apr 13, 2014 at 7:18 PM, Katja Hebestreit <katjah at stanford.edu>wrote:

> Hello,
>
> I get an error when I try to import my gff file:
>
> txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")
>
> Error in .parse_attrCol(attrCol, file, colnames) :
>   Some attributes do not conform to 'tag value' format
>
> This is how my file looks like:
>
> chr1    mm9_refFlat     stop_codon      3206103 3206105 0.000000        -
>       .       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3206106 3207049 0.000000        -       2
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     exon    3204563 3207049 0.000000        -       .
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3411783 3411982 0.000000        -       1
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     exon    3411783 3411982 0.000000        -       .
>       gene_id "Xkr4"; transcript_id "Xkr4";
> chr1    mm9_refFlat     CDS     3660633 3661429 0.000000        -       0
>       gene_id "Xkr4"; transcript_id "Xkr4";
>
> I have the feeling that this has something to do with the missing exon
> rank information in my file. Is that true? Is there a way to import this
> file? All I want to do is to determine the gene lengths.
>

It is most likely as the error says: some of your attributes are malformed.
Is that the entire file listed above, or is there more? If you could get me
the file somehow I could diagnose the issue.


>
> Could anyone help? That would be awesome!
> Cheers,
> Katja
>
>
> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19  Biobase_2.23.6
> [4] GenomicRanges_1.16.0   GenomeInfoDb_0.99.32   IRanges_1.21.45
> [7] BiocGenerics_0.9.3     BiocInstaller_1.14.1
>
> loaded via a namespace (and not attached):
>  [1] BatchJobs_1.2           BBmisc_1.5              BiocParallel_0.6.0
>  [4] biomaRt_2.20.0          Biostrings_2.32.0       bitops_1.0-6
>  [7] brew_1.0-6              BSgenome_1.32.0         codetools_0.2-8
> [10] DBI_0.2-7               digest_0.6.4            fail_1.2
> [13] foreach_1.4.2           GenomicAlignments_1.0.0 iterators_1.0.7
> [16] plyr_1.8.1              Rcpp_0.11.1             RCurl_1.95-4.1
> [19] Rsamtools_1.16.0        RSQLite_0.11.4          rtracklayer_1.24.0
> [22] sendmailR_1.1-2         stats4_3.1.0            stringr_0.6.2
> [25] tools_3.1.0             XML_3.98-1.1            XVector_0.4.0
> [28] zlibbioc_1.10.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list