[BioC] EasyRNASeq - gff file is not recognized

Nicolas Delhomme delhomme at embl.de
Wed Mar 6 12:46:55 CET 2013


Dear Gabriella,

If you look at the vignette of the package:

vignette("easyRNASeq")

You'll see a short description of the format in section 4.4. More precisely, read the format description in the "genomeIntervals" section page 16 that describe how your gff3 file should look like. Given the error message you get, your gff file does not contain the ID key among the attributes (the ninth column) or the ID key is incorrectly formatted.

HTH,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Mar 6, 2013, at 11:39 AM, Maria Gabriela RL wrote:

> Dear Nico, 
> 
> Many thanks for your response. The gff3 file that I provided was able to be read. However, a new error came up. It seems to me that there is something wrong with my gff file. Could you recommend something.
> 
> Again, many thanks for your help,
> 
> Gabriela
> 
> > genes_FGS1 <- easyRNASeq(filesDirectory="/projects/irg/grp_stich/personal_folders/Gabby/NGS_R2/cluster/write/EASYRNASeq/",
> +  gapped=F,
> validity.check=TRUE,
> + validity.check=TRUE,
> + chr.map=chr.map,
> filenames=files,
> + organism="custom",
> + annotationMethod="gff",
> + annotationFile="/projects/irg/grp_stich/personal_folders/Gabby/NGS_R2/cluster/write/ZmB73_5b_FGS.gff",
> + count="genes",
> + filenames=files,
> + summarization="geneModels",
> + outputFormat="RNAseq")
> Checking arguments...
> Fetching annotations...
> Read 994386 records
> Error in .getGffRange(organismName(obj), filename = filename, ignoreWarnings = ignoreWarnings,  :
>   You gff file misses the ID key defining the exon ID in the gff attributes. The format should be 'gene:exon-number'.
> 
> 
> 
> 
> On Wed, Mar 6, 2013 at 11:09 AM, Nicolas Delhomme <delhomme at embl.de> wrote:
> Dear Gabriela,
> 
> Given that error:
> 
> > Your file: /projects/ZmB73_5b_FGS.gff3 does not contain a gff header: '##gff-version 3' as first line. Is that really a gff3 file?
> 
> 
> your gff3 appears not to contain a header.
> 
> Add the following line:
> 
> ##gff-version 3
> 
> to the beginning of your gff3 file and that should solve the problem.
> 
> Cheers,
> 
> Nico
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> Genome Biology Computational Support
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
> 
> 
> 
> 
> 
> On 6 Mar 2013, at 10:57, Gabriela [guest] wrote:
> 
> >
> > Hello,
> >
> > I am trying to generate a table of gene counts to use later with Deseq. However, I got an error message that the maize gff file that I am using is wrong. I downloaded this file directly from the plant ensembl website.
> >
> > I have to mention that I used a .gff file and a .gff3, and with both I have the same issue. Any hint in how to solve my problem.
> >
> > Many thanks for your help in advance,
> >
> > Gabriela
> >
> > -- output of sessionInfo():
> >
> >> genes_FGS1 <- easyRNASeq(filesDirectory="/projects/EASYRNASeq/",
> > +  gapped=F,
> > + validity.check=TRUE,
> > + chr.map=chr.map,
> > + organism="custom",
> > + annotationMethod="gff",
> > + annotationFile="/projects/ZmB73_5b_FGS.gff3",
> > + count="genes",
> > + filenames=files,
> > + summarization="geneModels",
> > + outputFormat="RNAseq")
> > Checking arguments...
> > Fetching annotations...
> > Error in .readGffGtf(filename = filename, ignoreWarnings = ignoreWarnings,  :
> >
> >>
> >
> >
> >
> >
> >
> >
> >
> >
> >> sessionInfo()
> > R version 2.15.2 (2012-10-26)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=C                 LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] grid      parallel  stats     graphics  grDevices utils     datasets
> > [8] methods   base
> >
> > other attached packages:
> > [1] VennDiagram_1.5.1      easyRNASeq_1.4.2       ShortRead_1.16.1
> > [4] latticeExtra_0.6-24    RColorBrewer_1.0-5     BSgenome_1.26.1
> > [7] biomaRt_2.14.0         genomeIntervals_1.14.0 intervals_0.13.3
> > [10] Rsamtools_1.10.1       Biostrings_2.26.2      GenomicRanges_1.10.4
> > [13] IRanges_1.16.4         edgeR_3.0.2            limma_3.14.1
> > [16] pasilla_0.2.13         DESeq_1.10.1           lattice_0.20-10
> > [19] locfit_1.5-8           DEXSeq_1.2.1           Biobase_2.18.0
> > [22] BiocGenerics_0.4.0     pasillaBamSubset_0.0.2
> >
> > loaded via a namespace (and not attached):
> > [1] annotate_1.34.1      AnnotationDbi_1.18.1 bitops_1.0-4.2
> > [4] DBI_0.2-5            genefilter_1.38.0    geneplotter_1.34.0
> > [7] hwriter_1.3          plyr_1.7.1           RCurl_1.91-1
> > [10] RSQLite_0.11.1       splines_2.15.2       statmod_1.4.15
> > [13] stats4_2.15.2        stringr_0.6.1        survival_2.36-14
> > [16] tools_2.15.2         XML_3.9-4            xtable_1.7-0
> > [19] zlibbioc_1.4.0
> >
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> 
> 



More information about the Bioconductor mailing list