[BioC] nearly exact overlaps between GAlignmentPairs and GenomicRanges

Michael Lawrence michafla at gene.com
Fri Aug 22 20:06:36 CEST 2014


Since we're going for the bounds of the paired alignments, use

reads <- unlist(range(reads), use.names=FALSE)

This assumes that you do not have cases where members of the same pair map
to different chromosomes.


On Fri, Aug 22, 2014 at 10:40 AM, Georg Otto <georg.otto at imm.ox.ac.uk>
wrote:

>
>
> Michael Lawrence <michafla at gene.com> writes:
>
> > But to get exactly what you want, we need to use type="within" with
> > findOverlaps() and then filter the hits for cases where the overhang is
> > below some tolerance (3bp):
> >
> > reads <- granges(galp) # simplifies things a bit
> > hits <- findOverlaps(reads, ranges, type="within")
> > overhang <- width(ranges[subjectHits(hits)]) -
> width(reads[queryHits(hits)])
> > hits <- hits[overhang < 3L]
> >
> > Now we need the count vector:
> >
> > counts <- countSubjectHits(hits)
> >
> > Then it's just a simple exercise to insert that code into a function that
> > conforms to the expectations of the "mode" parameter.
> >
>
>
> Thanks a lot for your help. I have problems, however, to wrap this into
> a mode function. Here is what I have done:
>
>
> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
> > galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
> > galp[3:4]
> GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
>                        seqnames strand :    ranges --     ranges
>                           <Rle>  <Rle> : <IRanges> --  <IRanges>
> EAS54_65:3:321:311:983     seq1      + :  [51, 85] -- [228, 262]
>    B7_591:5:42:540:501     seq1      + :  [60, 95] -- [224, 259]
> ---
> seqlengths:
>  seq1 seq2
>  1575 1584
>
> > ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263,
> 260)))
> > ranges
> GRanges with 2 ranges and 0 metadata columns:
>       seqnames    ranges strand
>          <Rle> <IRanges>  <Rle>
>   [1]     seq1 [50, 263]      *
>   [2]     seq1 [57, 260]      *
>   ---
>   seqlengths:
>    seq1
>      NA
>
> > summary <-summarizeOverlaps(ranges, galp[3:4], mode =
> "IntersectionStrict",
> +                             inter.feature = FALSE, fragments = FALSE)
>
> Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
>   the condition has length > 1 and only the first element will be used
>
> > assays(summary)$counts
>      reads
> [1,]     2
> [2,]     1
>
>
>
> However, I want the first feature only to be matched by galp[3] so I use
> this function:
>
>
> > intersectFun <-function (features, reads, ignore.strand = FALSE,
> inter.feature = TRUE) {
> +
> + reads <- granges(reads)
> + hits <- findOverlaps(reads, features, type="within")
> + overhang <- width(ranges[subjectHits(hits)]) -
> width(reads[queryHits(hits)])
> + hits <- hits[overhang < 3L]
> + counts <- countSubjectHits(hits)
> + return(counts)
> + }
>
>
> > summary <-summarizeOverlaps(ranges, galp, mode = "intersectFun",
> +                             inter.feature = FALSE, fragments = FALSE)
>
> Error in (function (classes, fdef, mtable)  (from #3) :
>   unable to find an inherited method for function ‘granges’ for signature
> ‘"GRangesList"’
> In addition: Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
>   the condition has length > 1 and only the first element will be used
>
>
> Apparently, the summarizeOverlaps converts the input GAlignmentPairs
> object into a GRangesList, which then can not be used by the mode
> function. Sorry, I do not have a deep understanding of these data
> structures. Could you help me with a workaround?
>
> Best wishes,
>
> Georg
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list