[BioC] summarizeOverlaps, inter.feature=FALSE and mode="IntersectionNotEmpty"

Thomas Girke thomas.girke at ucr.edu
Wed May 15 19:04:27 CEST 2013


I agree, given the definition of "IntersectionNotEmpty", the count for
that read in row 6 should be 1 for Feature1 and 0 for Feature2, whereas
in row 7 it should be 0 for both. This also means, in this specific case
there will be no difference in the read assignment (count results) when
inter.feature is set to FALSE or TRUE. I guess, maining consistent
feature defintions in the different counting modes will be certainly
important. Thanks Michael for pointing this out.

Thomas

On Wed, May 15, 2013 at 03:47:46PM +0000, Valerie Obenchain wrote:
> I agree Mike. Following this same logic, row 7 for IntersectionNotEmpty 
> should be 0,0 instead of 1,1.
> 
> Thomas, Martin, would you agree?
> 
> Valerie
> 
> 
> On 05/15/2013 04:47 AM, Michael Love wrote:
> > hi,
> >
> > The new arguments 'inter.feature' and 'fragments' of summarizeOverlaps look
> > very useful.
> >
> > I'm wondering about the behavior for inter.feature=FALSE and
> > mode="IntersectionNotEmpty".
> >
> > Earlier on the list, from Martin Morgan, there was the proposed behavior:
> >
> > You'd like to add 3 more modes, with inter_feature=FALSE. Reads are sometimes
> > 'counted twice'
> >
> > |*----------------------+-----+-----------+-----------+---------------|*|*
> > Mode                 | Row | Feature 1 | Feature 2 | Hits per read
> > |*|*----------------------+-----+-----------+-----------+---------------|*|*
> > Union                |   5 |         1 |         0 |             1
> > |*|*                      |   6 |         1 |         1 |
> > 2 |*|*                      |   7 |         1 |         1 |
> >   2 |*|* IntersectionStrict   |   5 |         1 |         0 |
> >    1 |*|*                      |   6 |         1 |         0 |
> >     1 |*|*                      |   7 |         1 |         1 |
> >      2 |*|* IntersectionNotEmpty |   5 |         1 |         0 |
> >       1 |*|*                      |   6 |         1 |         1 |
> >        2 |*|*                      |   7 |         1 |         1 |
> >         2 |*|*----------------------+-----+-----------+-----------+---------------|*
> >
> > ( https://stat.ethz.ch/pipermail/bioconductor/2013-April/052064.html )
> >
> > For row 6 of this diagram (
> > http://bioconductor.org/packages/2.13/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps-modes.pdf)
> > and 'IntersectionNotEmpty', I don't see why Feature 2 gets a hit.
> >
> > The man page has:
> >
> > IntersectionNotEmpty : For this counting mode, the features are partitioned
> > into unique disjoint regions. This is accomplished by disjoining the
> > feature ranges then removing ranges shared by more than one feature. The
> > result is a group of non-overlapping regions each of which belong to a
> > single feature.
> > Simple and gapped reads are counted if,
> > 1. the read or exactly 1 of the read fragments overlaps a unique disjoint
> > region
> > 2. the read or >1 read fragments overlap >1 unique disjoint region from
> > the same feature
> >
> > An example, where the read falls completely within Feature 1 but partially
> > in Feature 2, the behavior is consistent with the table above:
> >
> >> reads <-
> > GAlignments(seqnames="chr1",pos=1400L,cigar="500M",strand=strand("+"))
> >> features <- GRanges("chr1",IRanges(c(1300L,1700L),c(2000L,2400L)))
> >> assay(summarizeOverlaps(features,reads,mode="IntersectionNotEmpty",
> > inter.feature=FALSE))
> >       [,1]
> > [1,]    1
> > [2,]    1
> >
> >> disjoin(features)
> > GRanges with 3 ranges and 0 metadata columns:
> >        seqnames       ranges strand
> >           <Rle>    <IRanges>  <Rle>
> >    [1]     chr1 [1300, 1699]      *
> >    [2]     chr1 [1700, 2000]      *
> >    [3]     chr1 [2001, 2400]      *
> >    ---
> >    seqlengths:
> >     chr1
> >       NA
> >
> >> findOverlaps(reads, disjoin(features))
> > Hits of length 2
> > queryLength: 1
> > subjectLength: 3
> >    queryHits subjectHits
> >     <integer>   <integer>
> >   1         1           1
> >   2         1           2
> >
> > So the example read overlaps the unique region of Feature 1 and the shared
> > region, which supposedly has been removed. So I would expect it to only
> > count to Feature 1.
> >
> >> sessionInfo()
> > R Under development (unstable) (2013-05-14 r62742)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] C
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> > [8] base
> >
> > other attached packages:
> > [1] GenomicRanges_1.13.11 XVector_0.0.0         IRanges_1.19.7
> > [4] BiocGenerics_0.7.2    BiocInstaller_1.11.1  Defaults_1.1-1
> >
> > loaded via a namespace (and not attached):
> > [1] stats4_3.1.0
> >
> >
> > thanks,
> >
> > Mike
> >
> > 	[[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list