[BioC] flip strand of GappedAlignmentPairs Object

Valerie Obenchain vobencha at fhcrc.org
Wed Feb 27 20:47:06 CET 2013


Thanks Herve.

On 02/27/13 11:23, Hervé Pagès wrote:
> Hi Stefanie, Val,
>
> Makes sense to have a strand setter for GappedAlignmentPairs
> objects, like we have for GappedAlignments objects. I've added
> it to GenomicRanges devel:
>
>   > galp
>   GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>                          seqnames strand :       ranges -- ranges
>                             <Rle>  <Rle> : <IRanges> --    <IRanges>
>   EAS54_65:6:277:590:364     chr1      - : [ 681,  715] -- [ 503, 537]
>   EAS139_11:5:61:38:1182     chr1      - : [1388, 1422] -- [1205, 1239]
>    EAS188_7:4:21:443:404     chr1      + : [1345, 1379] -- [1529, 1563]
>    EAS1_105:6:23:885:274     chr2      + : [1089, 1123] -- [1289, 1323]
>     EAS1_108:1:65:787:74     chr1      - : [ 213,  247] -- [ 61,   95]
>   ---
>   seqlengths:
>    chr1 chr2
>    1575 1584
>
>   > strand(first(galp))
>   factor-Rle of length 5 with 3 runs
>     Lengths: 2 2 1
>     Values : - + -
>   Levels(3): + - *
>
>   > strand(last(galp))
>   factor-Rle of length 5 with 3 runs
>     Lengths: 2 2 1
>     Values : + - +
>   Levels(3): + - *
>
> Flip the strand:
>
>   > strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
>   > galp
>   GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>                          seqnames strand :       ranges -- ranges
>                           <Rle>  <Rle> : <IRanges> --    <IRanges>
>   EAS54_65:6:277:590:364     chr1      + : [ 681,  715] -- [ 503, 537]
>   EAS139_11:5:61:38:1182     chr1      + : [1388, 1422] -- [1205, 1239]
>    EAS188_7:4:21:443:404     chr1      - : [1345, 1379] -- [1529, 1563]
>    EAS1_105:6:23:885:274     chr2      - : [1089, 1123] -- [1289, 1323]
>     EAS1_108:1:65:787:74     chr1      + : [ 213,  247] -- [ 61,   95]
>   ---
>   seqlengths:
>    chr1 chr2
>    1575 1584
>
>   > strand(first(galp))
>   factor-Rle of length 5 with 3 runs
>     Lengths: 2 2 1
>     Values : + - +
>   Levels(3): + - *
>
>   > strand(last(galp))
>   factor-Rle of length 5 with 3 runs
>     Lengths: 2 2 1
>     Values : - + -
>   Levels(3): + - *
>
> Note that a slightly more efficient way to compute the flipped strand
> is:
>
>   runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
>                                    "-", "+")
>
> This will only make a difference on big GappedAlignmentPairs objects of
> course.
>
>   > sessionInfo()
>   R Under development (unstable) (2013-02-19 r62008)
>   Platform: x86_64-unknown-linux-gnu (64-bit)
>
>   locale:
>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>    [7] LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>   [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>   attached base packages:
>   [1] parallel  stats     graphics  grDevices utils     datasets methods
>   [8] base
>
>   other attached packages:
>   [1] Rsamtools_1.11.18     Biostrings_2.27.11 GenomicRanges_1.11.33
>   [4] IRanges_1.17.35       BiocGenerics_0.5.6
>
>   loaded via a namespace (and not attached):
>   [1] bitops_1.0-5   stats4_3.0.0   tools_3.0.0    zlibbioc_1.5.0
>
>
> GenomicRanges 1.11.33, will become available via biocLite(0 in the next
> 24 hours or so.
>
> Cheers,
> H.
>
>
> On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
>> You could use the low-level constructor
>>
>>      GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>>
>> 'isProperPair' is a logical vector the same length as 'first' and
>> 'last'.  For example,
>>
>>      GappedAlignmentPairs(left, right, !logical(length(left)))
>>
>>
>> Valerie
>>
>> On 02/27/13 07:12, Stefanie Tauber wrote:
>>> Hi Valerie,
>>>
>>> thanks for the quick answer.
>>> I  have already checked how the strand is assigned for paired-end data.
>>> After setting the strand for the left and right parts,
>>> can I somehow rebuild the gappedalignmentPairs object?
>>>
>>> best,
>>> Stefanie
>>>
>>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>>> <mailto:vobencha at fhcrc.org>>:
>>>
>>>> Hi Stephanie,
>>>>
>>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>>> GappedAlignments.
>>>>
>>>> >  ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>>> >      galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>>
>>>> You can extract the left/right parts with an accessor then look at
>>>> strand, cigars, gaps etc.
>>>>
>>>> > left <- left(galp)
>>>> > class(left)
>>>> [1] "GappedAlignments"
>>>> attr(,"package")
>>>> [1] "GenomicRanges"
>>>>
>>>> > strand(left)
>>>> factor-Rle of length 1572 with 665 runs
>>>>  Lengths:  3 25  1  1  1  3  1  1  3  2  2 ...  1  6  1  2 3  2 1  5
>>>>  1 51
>>>>  Values :  -  +  -  +  -  +  -  +  -  +  - ...  +  -  +  - +  - +  -
>>>>  +  -
>>>> Levels(3): + - *
>>>>
>>>> > head(cigar(left))
>>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>>
>>>> > head(ngap(left))
>>>> [1] 0 0 0 0 0 0
>>>>
>>>> It's on these left/right pairs that you can reset the strand.
>>>>
>>>> > strand(left)[1:3]
>>>> factor-Rle of length 3 with 1 run
>>>>  Lengths: 3
>>>>  Values : +
>>>> Levels(3): + - *
>>>> > strand(left)[1:3] <- "-"
>>>> > strand(left)[1:3]
>>>> factor-Rle of length 3 with 1 run
>>>>  Lengths: 3
>>>>  Values : -
>>>> Levels(3): + - *
>>>>
>>>> Please be sure to read the details on the man page regarding how the
>>>> strands were assigned.
>>>>
>>>> ?GappedAlignmentPairs
>>>>
>>>>
>>>> Valerie
>>>>
>>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>>> Dear list,
>>>>>
>>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>>> Then I could read in the data with:
>>>>>
>>>>> library(GenomicRanges)
>>>>>
>>>>>> reads = readGappedAlignments("data.bam")
>>>>> As I typically don't know which strand has been sequenced, it might
>>>>> be necessary to flip the strand of the reads before doing any kind
>>>>> of summarizeOverlap:
>>>>>
>>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>>> of transcript database.
>>>>>
>>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>>> I read the data:
>>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>>> it does not work as above.
>>>>> I could imagine several work-arounds, I just wonder if there is any
>>>>> straight approach which I miss at the moment..
>>>>>
>>>>>
>>>>> best regards,
>>>>> Stefanie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>> DI Stefanie Tauber
>>>
>>> Center for Integrative Bioinformatics Vienna (CIBIV)
>>> (CIBIV is a joint institute of Vienna University and Medical 
>>> University)
>>> Max F. Perutz Laboratories (MFPL)
>>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2
>>> Dr. Bohr Gasse 9
>>> A-1030 Wien, Austria
>>> Phone: ++43 +1 / 42772-4030
>>> Fax:     ++43 +1 / 42772-4098
>>> email: stefanie.tauber at univie.ac.at 
>>> <mailto:stefanie.tauber at univie.ac.at>
>>> www.cibiv.at <http://www.cibiv.at>
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list