[BioC] flip strand of GappedAlignmentPairs Object

Hervé Pagès hpages at fhcrc.org
Wed Feb 27 20:23:21 CET 2013


Hi Stefanie, Val,

Makes sense to have a strand setter for GappedAlignmentPairs
objects, like we have for GappedAlignments objects. I've added
it to GenomicRanges devel:

   > galp
   GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
                          seqnames strand :       ranges --       ranges
                             <Rle>  <Rle> :    <IRanges> --    <IRanges>
   EAS54_65:6:277:590:364     chr1      - : [ 681,  715] -- [ 503,  537]
   EAS139_11:5:61:38:1182     chr1      - : [1388, 1422] -- [1205, 1239]
    EAS188_7:4:21:443:404     chr1      + : [1345, 1379] -- [1529, 1563]
    EAS1_105:6:23:885:274     chr2      + : [1089, 1123] -- [1289, 1323]
     EAS1_108:1:65:787:74     chr1      - : [ 213,  247] -- [  61,   95]
   ---
   seqlengths:
    chr1 chr2
    1575 1584

   > strand(first(galp))
   factor-Rle of length 5 with 3 runs
     Lengths: 2 2 1
     Values : - + -
   Levels(3): + - *

   > strand(last(galp))
   factor-Rle of length 5 with 3 runs
     Lengths: 2 2 1
     Values : + - +
   Levels(3): + - *

Flip the strand:

   > strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
   > galp
   GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
                          seqnames strand :       ranges --       ranges
                           <Rle>  <Rle> :    <IRanges> --    <IRanges>
   EAS54_65:6:277:590:364     chr1      + : [ 681,  715] -- [ 503,  537]
   EAS139_11:5:61:38:1182     chr1      + : [1388, 1422] -- [1205, 1239]
    EAS188_7:4:21:443:404     chr1      - : [1345, 1379] -- [1529, 1563]
    EAS1_105:6:23:885:274     chr2      - : [1089, 1123] -- [1289, 1323]
     EAS1_108:1:65:787:74     chr1      + : [ 213,  247] -- [  61,   95]
   ---
   seqlengths:
    chr1 chr2
    1575 1584

   > strand(first(galp))
   factor-Rle of length 5 with 3 runs
     Lengths: 2 2 1
     Values : + - +
   Levels(3): + - *

   > strand(last(galp))
   factor-Rle of length 5 with 3 runs
     Lengths: 2 2 1
     Values : - + -
   Levels(3): + - *

Note that a slightly more efficient way to compute the flipped strand
is:

   runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
                                    "-", "+")

This will only make a difference on big GappedAlignmentPairs objects of
course.

   > sessionInfo()
   R Under development (unstable) (2013-02-19 r62008)
   Platform: x86_64-unknown-linux-gnu (64-bit)

   locale:
    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=C                 LC_NAME=C
    [9] LC_ADDRESS=C               LC_TELEPHONE=C
   [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

   attached base packages:
   [1] parallel  stats     graphics  grDevices utils     datasets  methods
   [8] base

   other attached packages:
   [1] Rsamtools_1.11.18     Biostrings_2.27.11    GenomicRanges_1.11.33
   [4] IRanges_1.17.35       BiocGenerics_0.5.6

   loaded via a namespace (and not attached):
   [1] bitops_1.0-5   stats4_3.0.0   tools_3.0.0    zlibbioc_1.5.0


GenomicRanges 1.11.33, will become available via biocLite(0 in the next
24 hours or so.

Cheers,
H.


On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
> You could use the low-level constructor
>
>      GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>
> 'isProperPair' is a logical vector the same length as 'first' and
> 'last'.  For example,
>
>      GappedAlignmentPairs(left, right, !logical(length(left)))
>
>
> Valerie
>
> On 02/27/13 07:12, Stefanie Tauber wrote:
>> Hi Valerie,
>>
>> thanks for the quick answer.
>> I  have already checked how the strand is assigned for paired-end data.
>> After setting the strand for the left and right parts,
>> can I somehow rebuild the gappedalignmentPairs object?
>>
>> best,
>> Stefanie
>>
>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>> <mailto:vobencha at fhcrc.org>>:
>>
>>> Hi Stephanie,
>>>
>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>> GappedAlignments.
>>>
>>> >  ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>> >      galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>
>>> You can extract the left/right parts with an accessor then look at
>>> strand, cigars, gaps etc.
>>>
>>> > left <- left(galp)
>>> > class(left)
>>> [1] "GappedAlignments"
>>> attr(,"package")
>>> [1] "GenomicRanges"
>>>
>>> > strand(left)
>>> factor-Rle of length 1572 with 665 runs
>>>  Lengths:  3 25  1  1  1  3  1  1  3  2  2 ...  1  6  1  2  3  2 1  5
>>>  1 51
>>>  Values :  -  +  -  +  -  +  -  +  -  +  - ...  +  -  +  -  +  - +  -
>>>  +  -
>>> Levels(3): + - *
>>>
>>> > head(cigar(left))
>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>
>>> > head(ngap(left))
>>> [1] 0 0 0 0 0 0
>>>
>>> It's on these left/right pairs that you can reset the strand.
>>>
>>> > strand(left)[1:3]
>>> factor-Rle of length 3 with 1 run
>>>  Lengths: 3
>>>  Values : +
>>> Levels(3): + - *
>>> > strand(left)[1:3] <- "-"
>>> > strand(left)[1:3]
>>> factor-Rle of length 3 with 1 run
>>>  Lengths: 3
>>>  Values : -
>>> Levels(3): + - *
>>>
>>> Please be sure to read the details on the man page regarding how the
>>> strands were assigned.
>>>
>>> ?GappedAlignmentPairs
>>>
>>>
>>> Valerie
>>>
>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>> Dear list,
>>>>
>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>> Then I could read in the data with:
>>>>
>>>> library(GenomicRanges)
>>>>
>>>>> reads = readGappedAlignments("data.bam")
>>>> As I typically don't know which strand has been sequenced, it might
>>>> be necessary to flip the strand of the reads before doing any kind
>>>> of summarizeOverlap:
>>>>
>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>> of transcript database.
>>>>
>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>> I read the data:
>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>> it does not work as above.
>>>> I could imagine several work-arounds, I just wonder if there is any
>>>> straight approach which I miss at the moment..
>>>>
>>>>
>>>> best regards,
>>>> Stefanie
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> DI Stefanie Tauber
>>
>> Center for Integrative Bioinformatics Vienna (CIBIV)
>> (CIBIV is a joint institute of Vienna University and Medical University)
>> Max F. Perutz Laboratories (MFPL)
>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>> Dr. Bohr Gasse 9
>> A-1030 Wien, Austria
>> Phone: ++43 +1 / 42772-4030
>> Fax:     ++43 +1 / 42772-4098
>> email: stefanie.tauber at univie.ac.at <mailto:stefanie.tauber at univie.ac.at>
>> www.cibiv.at <http://www.cibiv.at>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list