[BioC] easyRNA adapting annotations to avoid overlapping synthetic exons

Bayles, Darrell Darrell.Bayles at ARS.USDA.GOV
Thu Feb 28 17:53:02 CET 2013


Dear Nico,

I've read a number of posts in different forums (including BioC) from people desiring to adapt annotations in order to deal with overlapping synthetic exons.  You indicated in this forum (Wed Jan 9, 2013) that you were working on an example, in the easyRNASeq developer version, on how to perform this type of adaptation of an annotation.  Similarly, I would like to remove the overlaps from an annotation that I'm working with, and have been stymied in my efforts to perform that modification of the annotation.   Has that functionality been committed to the development release of easyRNASeq, or can you provide an example of the R workflow needed to remove the overlaps in a gene model computed by easyRNASeq?

> rnaSeq<-easyRNASeq(
+ organism="Btaurus",
+ annotationMethod="gtf",
+ annotationFile="ensembl.gtf",
+ gapped=TRUE, count="genes",
+ summarization="geneModels",
+ pattern="*_B.bam$",
+ filesDirectory=".",
+ outputFormat="RNAseq")
Checking arguments...
Fetching annotations...
Read 478833 records
Computing gene models...
Summarizing counts...
Processing 733_H_0_B.bam
Updating the read length information.
The alignments are gapped.
Minimum length of 1 bp.
Maximum length of 51 bp.
Processing 736_H_0_B.bam
Updating the read length information.
The alignments are gapped.
Minimum length of 1 bp.
Maximum length of 51 bp.
Preparing output
Warning messages:
1: In easyRNASeq(organism = "Btaurus", annotationMethod = "gtf", annotationFile = "genes.gtf",  :
  Your organism has no mapping defined to perform the validity check for the UCSC compliance of the chromosome name.
Defined organism's mapping can be listed using the 'knownOrganisms' function.
To benefit from the validity check, you can provide a 'chr.map' to your 'easyRNASeq' function call.
As you did not do so, 'validity.check' is turned off
2: In .Method(..., deparse.level = deparse.level) :
  number of columns of result is not a multiple of vector length (arg 35)
3: In easyRNASeq(organism = "Btaurus", annotationMethod = "gtf", annotationFile = "genes.gtf",  :
  There are 410 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] easyRNASeq_1.4.2       ShortRead_1.16.4       latticeExtra_0.6-24
 [4] RColorBrewer_1.0-5     Rsamtools_1.10.2       DESeq_1.10.1
 [7] lattice_0.20-13        locfit_1.5-8           BSgenome_1.26.1
[10] GenomicRanges_1.10.6   Biostrings_2.26.3      IRanges_1.16.6
[13] edgeR_3.0.8            limma_3.14.4           biomaRt_2.14.0
[16] Biobase_2.18.0         genomeIntervals_1.14.0 BiocGenerics_0.4.0
[19] intervals_0.13.3

loaded via a namespace (and not attached):
 [1] annotate_1.36.0      AnnotationDbi_1.20.3 bitops_1.0-5
 [4] DBI_0.2-5            genefilter_1.40.0    geneplotter_1.36.0
 [7] grid_2.15.2          hwriter_1.3          RCurl_1.95-3
[10] RSQLite_0.11.2       splines_2.15.2       stats4_2.15.2
[13] survival_2.37-2      tools_2.15.2         XML_3.95-0.1
[16] xtable_1.7-1         zlibbioc_1.4.0

Any help is greatly appreciated.

Darrell

==========================================
Darrell O. Bayles, M.S., Ph.D.
USDA, ARS, National Animal Disease Center
Infectious Bacterial Diseases Research Unit
1920 Dayton Ave, Bldg 24
P.O. Box 70
Ames, IA  50010
Tel: (515) 337-7165
Fax: (515) 337-7002
==========================================





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.



More information about the Bioconductor mailing list