[BioC] summarizeOverlaps mode ignoring inter feature overlaps

Martin Morgan mtmorgan at fhcrc.org
Sat Apr 13 01:40:15 CEST 2013


On 04/12/2013 03:57 PM, Wei Shi wrote:
>
> On Apr 13, 2013, at 12:08 AM, Martin Morgan wrote:
>
>> On 04/12/2013 06:53 AM, Michael Lawrence wrote:
>>> Hi Wei,
>>>
>>> summarizeSpliceOverlaps does not yet exist. We held off on that until we
>>> determined whether findSpliceOverlaps was useful. And yes,
>>> findSpliceOverlaps counts reads on a per-feature basis, so it will
>>> double count reads in that way. That's totally intentional (and I'm
>>> pretty sure what Thomas was wanting).
>>
>> also the object returned can be easily manipulated?
>>
>>> olaps = findSpliceOverlaps(galp, genes) olaps
>> Hits of length 2 queryLength: 1 subjectLength: 2 queryHits subjectHits
>> compatible    unique    coding strandSpecific <integer>   <integer>
>> <logical> <logical> <logical>      <logical> 1         1           1
>> FALSE     FALSE        NA           TRUE 2         1           2      FALSE
>> FALSE        NA           TRUE
>>> olaps[mcols(olaps)$unique]
>> Hits of length 0 queryLength: 1 subjectLength: 2
>>
>> and canonically
>>
>> ulaps = olaps[mcols(olaps)$unique] tabulate(subjectHits(ulaps),
>> subjectLength(ulaps))
>>
>> see ?Hits
>>
>> Martin
>
> But I got no fragments assigned to transcripts. Is this what you meant to
> do?
>
>> ulaps = olaps[mcols(olaps)$unique] tabulate(subjectHits(ulaps),
>> subjectLength(ulaps))
> [1] 0 0

I meant only to illustrate that the object is easily manipulated to implement
different counting schemes. You said

> The findSpliceOverlaps function seems to count reads more than once as well.
> I ran the sample code in the help page for this function and found that the
> single fragment was assigned to both transcripts:

and indeed the function returns all 'hits' in the sense of non-zero overlap 
between read and gene. You're interested in counting 'no more than once', and 
the transcript does not align uniquely (or in a way that is compatible with each 
gene model), so yes, it seems reasonable to count 0 hits for each feature.

On the other hand

      galp <- GAlignmentPairs(
          GAlignments("chr1", 5L, "6M", strand("+")),
          GAlignments("chr1", 20L, "6M", strand("-")),
          isProperPair=TRUE)

and

 >      (olaps <- findSpliceOverlaps(galp, genes))
Hits of length 2
queryLength: 1
subjectLength: 2
   queryHits subjectHits compatible    unique    coding strandSpecific
    <integer>   <integer>  <logical> <logical> <logical>      <logical>
  1         1           1       TRUE      TRUE        NA           TRUE
  2         1           2      FALSE     FALSE        NA           TRUE
 >      (ulaps <- olaps[mcols(olaps)$unique])
Hits of length 1
queryLength: 1
subjectLength: 2
   queryHits subjectHits compatible    unique    coding strandSpecific
    <integer>   <integer>  <logical> <logical> <logical>      <logical>
  1         1           1       TRUE      TRUE        NA           TRUE
 >      tabulate(subjectHits(ulaps), subjectLength(ulaps))
[1] 1 0

Martin


>
> Cheers Wei
>
>
> ______________________________________________________________________ The
> information in this email is confidential and intended solely for the
> addressee. You must not disclose, forward, print or use it without the
> permission of the sender.
> ______________________________________________________________________
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list