[BioC] GenomicFeatures: Problem with makeTranscriptDbFromGFF
Katja Hebestreit
katjah at stanford.edu
Mon Apr 14 04:18:43 CEST 2014
Hello,
I get an error when I try to import my gff file:
txdb <- makeTranscriptDbFromGFF(file="file.gtf", format="gtf")
Error in .parse_attrCol(attrCol, file, colnames) :
Some attributes do not conform to 'tag value' format
This is how my file looks like:
chr1 mm9_refFlat stop_codon 3206103 3206105 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3206106 3207049 0.000000 - 2 gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat exon 3204563 3207049 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3411783 3411982 0.000000 - 1 gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat exon 3411783 3411982 0.000000 - . gene_id "Xkr4"; transcript_id "Xkr4";
chr1 mm9_refFlat CDS 3660633 3661429 0.000000 - 0 gene_id "Xkr4"; transcript_id "Xkr4";
I have the feeling that this has something to do with the missing exon rank information in my file. Is that true? Is there a way to import this file? All I want to do is to determine the gene lengths.
Could anyone help? That would be awesome!
Cheers,
Katja
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicFeatures_1.16.0 AnnotationDbi_1.25.19 Biobase_2.23.6
[4] GenomicRanges_1.16.0 GenomeInfoDb_0.99.32 IRanges_1.21.45
[7] BiocGenerics_0.9.3 BiocInstaller_1.14.1
loaded via a namespace (and not attached):
[1] BatchJobs_1.2 BBmisc_1.5 BiocParallel_0.6.0
[4] biomaRt_2.20.0 Biostrings_2.32.0 bitops_1.0-6
[7] brew_1.0-6 BSgenome_1.32.0 codetools_0.2-8
[10] DBI_0.2-7 digest_0.6.4 fail_1.2
[13] foreach_1.4.2 GenomicAlignments_1.0.0 iterators_1.0.7
[16] plyr_1.8.1 Rcpp_0.11.1 RCurl_1.95-4.1
[19] Rsamtools_1.16.0 RSQLite_0.11.4 rtracklayer_1.24.0
[22] sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2
[25] tools_3.1.0 XML_3.98-1.1 XVector_0.4.0
[28] zlibbioc_1.10.0
More information about the Bioconductor
mailing list