[BioC] nearly exact overlaps between GAlignmentPairs and GenomicRanges
Michael Lawrence
michafla at gene.com
Fri Aug 22 20:06:36 CEST 2014
Since we're going for the bounds of the paired alignments, use
reads <- unlist(range(reads), use.names=FALSE)
This assumes that you do not have cases where members of the same pair map
to different chromosomes.
On Fri, Aug 22, 2014 at 10:40 AM, Georg Otto <georg.otto at imm.ox.ac.uk>
wrote:
>
>
> Michael Lawrence <michafla at gene.com> writes:
>
> > But to get exactly what you want, we need to use type="within" with
> > findOverlaps() and then filter the hits for cases where the overhang is
> > below some tolerance (3bp):
> >
> > reads <- granges(galp) # simplifies things a bit
> > hits <- findOverlaps(reads, ranges, type="within")
> > overhang <- width(ranges[subjectHits(hits)]) -
> width(reads[queryHits(hits)])
> > hits <- hits[overhang < 3L]
> >
> > Now we need the count vector:
> >
> > counts <- countSubjectHits(hits)
> >
> > Then it's just a simple exercise to insert that code into a function that
> > conforms to the expectations of the "mode" parameter.
> >
>
>
> Thanks a lot for your help. I have problems, however, to wrap this into
> a mode function. Here is what I have done:
>
>
> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
> > galp <- readGAlignmentPairs(ex1_file, use.names=TRUE)
> > galp[3:4]
> GAlignmentPairs with 2 alignment pairs and 0 metadata columns:
> seqnames strand : ranges -- ranges
> <Rle> <Rle> : <IRanges> -- <IRanges>
> EAS54_65:3:321:311:983 seq1 + : [51, 85] -- [228, 262]
> B7_591:5:42:540:501 seq1 + : [60, 95] -- [224, 259]
> ---
> seqlengths:
> seq1 seq2
> 1575 1584
>
> > ranges <- GRanges(c("seq1", "seq1"), IRanges(c(50, 57), end = c(263,
> 260)))
> > ranges
> GRanges with 2 ranges and 0 metadata columns:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] seq1 [50, 263] *
> [2] seq1 [57, 260] *
> ---
> seqlengths:
> seq1
> NA
>
> > summary <-summarizeOverlaps(ranges, galp[3:4], mode =
> "IntersectionStrict",
> + inter.feature = FALSE, fragments = FALSE)
>
> Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
> the condition has length > 1 and only the first element will be used
>
> > assays(summary)$counts
> reads
> [1,] 2
> [2,] 1
>
>
>
> However, I want the first feature only to be matched by galp[3] so I use
> this function:
>
>
> > intersectFun <-function (features, reads, ignore.strand = FALSE,
> inter.feature = TRUE) {
> +
> + reads <- granges(reads)
> + hits <- findOverlaps(reads, features, type="within")
> + overhang <- width(ranges[subjectHits(hits)]) -
> width(reads[queryHits(hits)])
> + hits <- hits[overhang < 3L]
> + counts <- countSubjectHits(hits)
> + return(counts)
> + }
>
>
> > summary <-summarizeOverlaps(ranges, galp, mode = "intersectFun",
> + inter.feature = FALSE, fragments = FALSE)
>
> Error in (function (classes, fdef, mtable) (from #3) :
> unable to find an inherited method for function ‘granges’ for signature
> ‘"GRangesList"’
> In addition: Warning message:
> In if (all(strand(reads) == "*")) ignore.strand <- TRUE :
> the condition has length > 1 and only the first element will be used
>
>
> Apparently, the summarizeOverlaps converts the input GAlignmentPairs
> object into a GRangesList, which then can not be used by the mode
> function. Sorry, I do not have a deep understanding of these data
> structures. Could you help me with a workaround?
>
> Best wishes,
>
> Georg
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list