[BioC] easyRNAseq remove overlapping features

Vincent Schulz Vincent.Schulz at yale.edu
Wed Jan 9 15:23:56 CET 2013


Hi Nico,

I would like to use easyRNAseq to count reads for RNA-seq.  The vignette says that "The ideal 
solution is to provide an annotation object that contains no overlapping features. The disjoin 
function from the IRanges package offers a way to achieve this."  I do not have much experience 
with IRanges, etc, and would be grateful for any pointers on how to do this, since it was not 
obvious to me.  I would like to not remove the genes that overlap, but instead remove the regions of 
the genes that overlap, leaving any unique regions.  And one additional request--would it be 
possible to have easyRNAseq have the option to calculate TPM as well as RPKM (using the 
non-overlapping gene length) ?  The reference for TPM is
http://www.ncbi.nlm.nih.gov/pubmed/22872506
TPM/RPKM would be useful for heatmaps and other display purposes.

Thanks,

Vince


library(easyRNASeq)
library(RnaSeqTutorial)
library(BSgenome.Dmelanogaster.UCSC.dm3)

count.table <- easyRNASeq(system.file(
"extdata",
package="RnaSeqTutorial"),
organism="Dmelanogaster",
readLength=36L,
annotationMethod="gff",
annotationFile=system.file(
"extdata",
"annot.gff",
package="RnaSeqTutorial"),
gapped=TRUE,
count="exons",
filenames="gapped.bam")

 > sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.19
  [2] RnaSeqTutorial_0.0.11
  [3] easyRNASeq_1.4.2
  [4] ShortRead_1.16.3
  [5] latticeExtra_0.6-24
  [6] RColorBrewer_1.0-5
  [7] Rsamtools_1.10.2
  [8] DESeq_1.10.1
  [9] lattice_0.20-13
[10] locfit_1.5-8
[11] BSgenome_1.26.1
[12] GenomicRanges_1.10.5
[13] Biostrings_2.26.2
[14] IRanges_1.16.4
[15] edgeR_3.0.8
[16] limma_3.14.3
[17] biomaRt_2.14.0
[18] Biobase_2.18.0
[19] genomeIntervals_1.14.0
[20] BiocGenerics_0.4.0
[21] intervals_0.13.3
[22] BiocInstaller_1.8.3

loaded via a namespace (and not attached):
  [1] annotate_1.36.0      AnnotationDbi_1.20.3 bitops_1.0-5
  [4] DBI_0.2-5            genefilter_1.40.0    geneplotter_1.36.0
  [7] grid_2.15.2          hwriter_1.3          RCurl_1.95-3
[10] RSQLite_0.11.2       splines_2.15.2       stats4_2.15.2
[13] survival_2.37-2      tools_2.15.2         XML_3.95-0.1
[16] xtable_1.7-0         zlibbioc_1.4.0



More information about the Bioconductor mailing list