[BioC] nearly exact overlaps between GAlignmentPairs and GenomicRanges

Georg Otto georg.otto at imm.ox.ac.uk
Fri Aug 22 19:40:06 CEST 2014



Michael Lawrence <michafla at gene.com> writes:

> But to get exactly what you want, we need to use type="within" with
> findOverlaps() and then filter the hits for cases where the overhang is
> below some tolerance (3bp):
>
> reads <- granges(galp) # simplifies things a bit
> hits <- findOverlaps(reads, ranges, type="within")
> overhang <- width(ranges[subjectHits(hits)]) - width(reads[queryHits(hits)])
> hits <- hits[overhang < 3L]
>
> Now we need the count vector:
>
> counts <- countSubjectHits(hits)
>
> Then it's just a simple exercise to insert that code into a function that
> conforms to the expectations of the "mode" parameter.
>


Thanks a lot for your help. I have problems, however, to wrap this into
a mode function. Here is what I have done:


> ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
> galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
> galp[3:4]
GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
                       seqnames strand :    ranges --     ranges
                          <Rle>  <Rle> : <IRanges> --  <IRanges>
EAS54_65:3:321:311:983     seq1      + :  [51, 85] -- [228, 262]
   B7_591:5:42:540:501     seq1      + :  [60, 95] -- [224, 259]
---
seqlengths:
 seq1 seq2
 1575 1584

> ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263, 260)))
> ranges
GRanges with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     seq1 [50, 263]      *
  [2]     seq1 [57, 260]      *
  ---
  seqlengths:
   seq1
     NA

> summary <-summarizeOverlaps(ranges, galp[3:4], mode = "IntersectionStrict",
+                             inter.feature = FALSE, fragments = FALSE)

Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
  the condition has length > 1 and only the first element will be used

> assays(summary)$counts
     reads
[1,]     2
[2,]     1



However, I want the first feature only to be matched by galp[3] so I use
this function:


> intersectFun <-function (features, reads, ignore.strand = FALSE, inter.feature = TRUE) {   
+ 
+ reads <- granges(reads)
+ hits <- findOverlaps(reads, features, type="within")
+ overhang <- width(ranges[subjectHits(hits)]) - width(reads[queryHits(hits)])
+ hits <- hits[overhang < 3L]
+ counts <- countSubjectHits(hits)
+ return(counts)
+ }


> summary <-summarizeOverlaps(ranges, galp, mode = "intersectFun",
+                             inter.feature = FALSE, fragments = FALSE)

Error in (function (classes, fdef, mtable)  (from #3) : 
  unable to find an inherited method for function ‘granges’ for signature ‘"GRangesList"’
In addition: Warning message:
In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
  the condition has length > 1 and only the first element will be used


Apparently, the summarizeOverlaps converts the input GAlignmentPairs
object into a GRangesList, which then can not be used by the mode
function. Sorry, I do not have a deep understanding of these data
structures. Could you help me with a workaround?

Best wishes,

Georg



More information about the Bioconductor mailing list