[BioC] rtracklayer importing gtf files

Sam McInturf [guest] guest at bioconductor.org
Mon Apr 1 22:23:21 CEST 2013


My name is Sam, a grad student at the University of Missouri, and I am having trouble importing my .gtf file using the rtracklayer function import().  My gtf file seems to have the 9 columns specified by the gtf format when looking at it in a text editor, but on import I have 9 columns labeled as "X1.", "X2",...,"X9." nearly all of the entries are NA.  In X1. there are 817 "1\" entries (of 474351), in X2. there are 534 "2\" and so on.  The .gtf file was downloaded from http://tophat.cbcb.umd.edu/igenomes.html
Arabidopsis NCBI TAIR10 release, using the genes.gtf file generated after opening the .tar.gz.  

I import my file by
myGTF <- "path/to/file.gtf"
newGTF <- import(myGTF, asRangedData = FALSE)

The way I read the import.gff manual, the .gtf extension will tell the function how to parse the file with out specifying version parameter.  

I am trying to follow the summarizeOverlaps() method of generating read counts from the GenomicRanges packages for differential expression using DESeq.
 
Does anyone know what has happened, or more generally what can I do to import my file?

 -- output of sessionInfo(): 

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq_1.8.3         locfit_1.5-8        Biobase_2.16.0     
[4] rtracklayer_1.16.3  Rsamtools_1.8.6     Biostrings_2.24.1  
[7] GenomicRanges_1.8.6 IRanges_1.14.3      BiocGenerics_0.2.0 

loaded via a namespace (and not attached):
 [1] annotate_1.34.0      AnnotationDbi_1.18.0 bitops_1.0-4.1      
 [4] BSgenome_1.24.0      DBI_0.2-5            genefilter_1.38.0   
 [7] geneplotter_1.34.0   grid_2.15.0          lattice_0.20-6      
[10] RColorBrewer_1.0-5   RCurl_1.91-1         RSQLite_0.11.1      
[13] splines_2.15.0       stats4_2.15.0        survival_2.36-14    
[16] tools_2.15.0         XML_3.9-4            xtable_1.7-0        
[19] zlibbioc_1.2.0      


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list