[BioC] Fwd: summarizeOverlaps behavior on RNA-Seq paired end stranded data

Valerie Obenchain vobencha at fhcrc.org
Sat Oct 5 01:45:39 CEST 2013


Hi,

On 10/03/2013 12:16 PM, Abhishek Pratap wrote:
> Hi Valerie
>
> Here are follow up questions. Let me ask them through examples.
>
>
> #creating the feature
> gene1 <- GRanges("chr1", IRanges(c(1,40),c(20,60)),"+")
> annotation <- GRangesList("gene1"=gene1)
>
>
> 1.  even if one mate of the read falls outside the annotation feature it
> is still counted ..

Yes. This is intended behavior.

> r1 <- GAlignments("chr1",1L,"10M",strand("+"))
> r2 <- GAlignments("chr1",65L,"10M",strand("-"))
> pair <- GAlignmentPairs(r1,r2,isProperPair=TRUE)
> result <- summarizeOverlaps(annotation,pair,ignore.strand=TRUE)
>  > assays(result)$count
>        reads
> gene1     1
>
>
> 2.  if the read 1 is on the opp strand compared to annotation (which can
> happen in second strand RNA-Seq protocol) the counting is not proper
> unless we ignore the strand. I guess the counting is only done based on
> the strand of the read 1 and ignores the read 2 strand ...?

Yes, overlaps is done based on the strand of the first read (aka first 
segment in template). How pairs are paired and how strand is determined 
is described in detail on the man page ?GAlignmentPairs 
(?GappedAlignmentPairs in release). Specifically, see the 'Details' and 
section on 'strand(x)'.

> r1 <- GAlignments("chr1",1L,"10M",strand("-"))
> r2 <- GAlignments("chr1",65L,"10M",strand("+"))
> pair <- GAlignmentPairs(r1,r2,isProperPair=TRUE)
> result <- summarizeOverlaps(annotation,pair,ignore.strand=FALSE)
>  > assays(result)$count
>        reads
> gene1     0
>
>
> I am a bit skeptical if we should turn on the non stranded counting for
> stranded paired RNA-Seq data. I guess a proper RNA-Seq read pair should
> have reads on the opposite strand for it be counted. They can be either
> +- or -+ depending on the protocol and checking this sanity as part of
> the counting process would help.

Part of the pairing process does involve checking that mates came from 
opposite strands. See these pages for details of the pairing algorithm.

?readBamGappedAlignmentPairs
?findMateAlignment

The question of exactly how stranded paired RNA-Seq data should be 
counted is better answered by others on the list. Specifically, should 
you be looking for overlap on the '+' or '-' or both strands.

>
> I understand this conversation could lead to confusion over email. Let
> me know if you want to meet in person and discuss.

Sure, stop by anytime if you want to discuss in person.

Valerie

>
> Cheers!
> -Abhi
>
>
> On Thu, Oct 3, 2013 at 8:27 AM, Valerie Obenchain <vobencha at fhcrc.org
> <mailto:vobencha at fhcrc.org>> wrote:
>
>     On 10/02/2013 04:31 PM, Abhishek Pratap wrote:
>
>         Hi
>
>
>         On Wed, Oct 2, 2013 at 10:28 AM, Valerie Obenchain
>         <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>         <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>> wrote:
>
>              Hi,
>
>
>              On 10/02/2013 12:35 AM, Abhishek Pratap wrote:
>
>                  Hi All
>
>                  Just wanted to check on the expected behavior of the
>                  summarizeOverlaps
>                  function in the GenomicRanges package for the following
>         cases.
>
>
>                  1. In the case of paired end data does it count each
>         read pair
>                  as 1 or 2
>                  against a feature.
>
>
>              Pairs are counted as a single 'hit' reguardless if one or
>         both mates
>              overlap the feature.
>
>
>
>                  2. For stranded protocols of RNA-Seq does it take it
>         account the
>                  opposite
>                  strand of the mate of a read when counting or does it
>         take only
>                  one read
>                  matching the strand into consideration.
>
>
>              When counting, pairs are treated as though they are from
>         the same
>              strand. In general, you can ignore the strand when counting by
>              setting the 'ignore.strand=TRUE'.
>
>
>
>                  3. For second strand RNA-Seq protocol where the read-1
>         matches
>                  to the
>                  opposite strand of gene will summarizeOverlap work ?
>
>
>              I'm not sure what you mean. Please provide an example.
>
>
>              summarizeOverlaps() reads the data from a BAM into a
>         GAlignmentPairs
>              or GAlignmentsList container for counting. You can
>         investigate the
>              behavior using a small test case.
>
>              ## paired-end record
>              ga1 <- GAlignments("chr1", 1L, "10M", strand("+"))
>              ga2 <- GAlignments("chr1", 15L, "11M", strand("-"))
>              galp <- GAlignmentPairs(ga1, ga2, TRUE)
>
>              ## annotation
>              ann <- GRanges("chr1", IRanges(c(1, 5, 12, 20), c(25, 20,
>         14, 30)), "-")
>
>              ## all with mode="Union"
>              se1 <- summarizeOverlaps(ann[1], galp, ignore.strand=TRUE)
>               > assays(se1)$counts
>                    reads
>              [1,]     1
>              se2 <- summarizeOverlaps(ann[1], galp, ignore.strand=FALSE)
>               > assays(se2)$counts
>                    reads
>              [1,]     0
>              se3 <- summarizeOverlaps(ann[4], galp, ignore.strand=TRUE)
>               > assays(se3)$counts
>                    reads
>              [1,]     1
>
>
>              Valerie
>
>
>         I think thats a good idea to test it on a artificial set. Just a
>         quick
>         question. I cant seem to use the constructor GAlingments(). Is that
>         something to do with the version ?
>
>
>         library("GenomicRanges")
>         library("Rsamtools")
>         GAlignments("chr1",1L,"10M",__strand("+"))
>           >Error: could not find function "GAlignments"
>
>
>     The class is called GappedAlignments in release and GAlignments in
>     devel (name changed this cycle). For an example of constructing one,
>     see the summarizeOverlaps man page.
>
>     ?summarizeOverlaps
>
>
>     Valerie
>
>
>
>
>         Partial output from sessionInfo()
>         other attached packages:
>         [1] Rsamtools_1.12.4     Biostrings_2.28.0    GenomicRanges_1.12.5
>         IRanges_1.18.4       BiocGenerics_0.6.0
>
>
>         Thanks!
>         -Abhi
>
>
>
>                  Thanks!
>                  -Abhi
>
>                           [[alternative HTML version deleted]]
>
>                  ___________________________________________________
>                  Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-project.org>>
>         https://stat.ethz.ch/mailman/____listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
>                  Search the archives:
>         http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
>
>



More information about the Bioconductor mailing list