[BioC] Genomic Features - makeTranscriptDbFromGFF()

Geoffrey Thomson [guest] guest at bioconductor.org
Mon Jun 16 11:30:05 CEST 2014


I am trying to use derfinder but it requires I have my genome features in the TranscriptDB from the GenomicFeatures R package.

Normally one can use the inbuilt function to make a TranscriptDB from UNSC database however my organism, a plant, is not included in the UNSC database.

Consequently I am trying to use the makeTranscriptDbFromGFF() function to make the TranscriptDB using a gtf file I created using tophat and cuffmerge on some RNA-Seq samples. However following the examples I can't get it to work.

Here is what I've got:

chrominfo <- data.frame(chrom = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8"),
                        length= c(52991155, 45729672, 55515152, 56582383, 43630510, 35275713, 49172423,45569985),
                        is_circular= rep(FALSE, 8))

exons <- makeTranscriptDbFromGFF(file = "~/merged.gtf",
                             format = "gtf",
                             exonRankAttributeName="exon_number",
                             chrominfo=chrominfo)

Unfortunately this is the output:

extracting transcript information
Estimating transcript ranges.
Extracting gene IDs
Processing splicing information for gtf file.
Prepare the 'metadata' data frame ... metadata: OK
Error in .normargTranscripts(transcripts) :
  values in 'transcripts$tx_strand' must be "+" or "-"
In addition: Warning message:
In if is.na(chrominfo)) { :
  the condition has length > 1 and only the first element will be used

What am I missing?? I also posted this question on biostar.

This is the top of my gtf file:

chr1    Cufflinks    exon    6524    6620    .    +    .    gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "Medtr1g004940"; oId "Medtr1g004940.1"; nearest_ref "Medtr1g004940.1"; class_code "="; tss_id "TSS1"; p_id "P1";
chr1    Cufflinks    exon    7098    7366    .    +    .    gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "Medtr1g004940"; oId "Medtr1g004940.1"; nearest_ref "Medtr1g004940.1"; class_code "="; tss_id "TSS1"; p_id "P1";
chr1    Cufflinks    exon    14514    14556    .    +    .    gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "1"; gene_name "Medtr1g004950"; oId "Medtr1g004950.1"; nearest_ref "Medtr1g004950.1"; class_code "="; tss_id "TSS2"; p_id "P2";
chr1    Cufflinks    exon    15503    15729    .    +    .    gene_id "XLOC_000002"; transcript_id "TCONS_00000002"; exon_number "2"; gene_name "Medtr1g004950"; oId "Medtr1g004950.1"; nearest_ref "Medtr1g004950.1"; class_code "="; tss_id "TSS2"; p_id "P2";
chr1    Cufflinks    exon    16283    16326    .    +    .    gene_id "XLOC_000003"; transcript_id "TCONS_00000003"; exon_number "1"; gene_name "Medtr1g004960"; oId "Medtr1g004960.1"; nearest_ref "Medtr1g004960.1"; class_code "="; tss_id "TSS3"; p_id "P3";
chr1    Cufflinks    exon    17061    17304    .    +    .    gene_id "XLOC_000003"; transcript_id "TCONS_00000003"; exon_number "2"; gene_name "Medtr1g004960"; oId "Medtr1g004960.1"; nearest_ref "Medtr1g004960.1"; class_code "="; tss_id "TSS3"; p_id "P3";
chr1    Cufflinks    exon    18242    18382    .    +    .    gene_id "XLOC_000003"; transcript_id "TCONS_00000003"; exon_number "3"; gene_name "Medtr1g004960"; oId "Medtr1g004960.1"; nearest_ref "Medtr1g004960.1"; class_code "="; tss_id "TSS3"; p_id "P3";



 -- output of sessionInfo(): 

R version 3.1.0 (2014-04-10)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C               LC_TIME=en_NZ.UTF-8       
 [4] LC_COLLATE=en_NZ.UTF-8     LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8   
 [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] splines   grid      parallel  stats     graphics  grDevices utils     datasets 
 [9] methods   base     

other attached packages:
 [1] rtracklayer_1.22.7     GenomicFeatures_1.14.5 AnnotationDbi_1.24.0  
 [4] Biobase_2.22.0         GenomicRanges_1.14.4   XVector_0.2.0         
 [7] derfinder_1.0.2        locfdr_1.1-7           HiddenMarkov_1.7-0    
[10] limma_3.18.13          Genominator_1.16.0     GenomeGraphs_1.22.0   
[13] biomaRt_2.18.0         IRanges_1.20.7         BiocGenerics_0.8.0    
[16] RSQLite_0.11.4         DBI_0.2-7             

loaded via a namespace (and not attached):
[1] Biostrings_2.30.1 bitops_1.0-6      BSgenome_1.30.0   RCurl_1.95-4.1    Rsamtools_1.14.3 
[6] stats4_3.1.0      tools_3.1.0       XML_3.98-1.1      zlibbioc_1.8.0 

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list