[BioC] flip strand of GappedAlignmentPairs Object

Stefanie Tauber stefanie.tauber at univie.ac.at
Thu Feb 28 10:13:27 CET 2013


Thanks a lot to both of you!
Best,
Stefanie


Am 27.02.2013 um 20:47 schrieb Valerie Obenchain <vobencha at fhcrc.org>:

> Thanks Herve.
> 
> On 02/27/13 11:23, Hervé Pagès wrote:
>> Hi Stefanie, Val,
>> 
>> Makes sense to have a strand setter for GappedAlignmentPairs
>> objects, like we have for GappedAlignments objects. I've added
>> it to GenomicRanges devel:
>> 
>>  > galp
>>  GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>>                         seqnames strand :       ranges -- ranges
>>                            <Rle>  <Rle> : <IRanges> --    <IRanges>
>>  EAS54_65:6:277:590:364     chr1      - : [ 681,  715] -- [ 503, 537]
>>  EAS139_11:5:61:38:1182     chr1      - : [1388, 1422] -- [1205, 1239]
>>   EAS188_7:4:21:443:404     chr1      + : [1345, 1379] -- [1529, 1563]
>>   EAS1_105:6:23:885:274     chr2      + : [1089, 1123] -- [1289, 1323]
>>    EAS1_108:1:65:787:74     chr1      - : [ 213,  247] -- [ 61,   95]
>>  ---
>>  seqlengths:
>>   chr1 chr2
>>   1575 1584
>> 
>>  > strand(first(galp))
>>  factor-Rle of length 5 with 3 runs
>>    Lengths: 2 2 1
>>    Values : - + -
>>  Levels(3): + - *
>> 
>>  > strand(last(galp))
>>  factor-Rle of length 5 with 3 runs
>>    Lengths: 2 2 1
>>    Values : + - +
>>  Levels(3): + - *
>> 
>> Flip the strand:
>> 
>>  > strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
>>  > galp
>>  GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>>                         seqnames strand :       ranges -- ranges
>>                          <Rle>  <Rle> : <IRanges> --    <IRanges>
>>  EAS54_65:6:277:590:364     chr1      + : [ 681,  715] -- [ 503, 537]
>>  EAS139_11:5:61:38:1182     chr1      + : [1388, 1422] -- [1205, 1239]
>>   EAS188_7:4:21:443:404     chr1      - : [1345, 1379] -- [1529, 1563]
>>   EAS1_105:6:23:885:274     chr2      - : [1089, 1123] -- [1289, 1323]
>>    EAS1_108:1:65:787:74     chr1      + : [ 213,  247] -- [ 61,   95]
>>  ---
>>  seqlengths:
>>   chr1 chr2
>>   1575 1584
>> 
>>  > strand(first(galp))
>>  factor-Rle of length 5 with 3 runs
>>    Lengths: 2 2 1
>>    Values : + - +
>>  Levels(3): + - *
>> 
>>  > strand(last(galp))
>>  factor-Rle of length 5 with 3 runs
>>    Lengths: 2 2 1
>>    Values : - + -
>>  Levels(3): + - *
>> 
>> Note that a slightly more efficient way to compute the flipped strand
>> is:
>> 
>>  runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
>>                                   "-", "+")
>> 
>> This will only make a difference on big GappedAlignmentPairs objects of
>> course.
>> 
>>  > sessionInfo()
>>  R Under development (unstable) (2013-02-19 r62008)
>>  Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>>  locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>>  attached base packages:
>>  [1] parallel  stats     graphics  grDevices utils     datasets methods
>>  [8] base
>> 
>>  other attached packages:
>>  [1] Rsamtools_1.11.18     Biostrings_2.27.11 GenomicRanges_1.11.33
>>  [4] IRanges_1.17.35       BiocGenerics_0.5.6
>> 
>>  loaded via a namespace (and not attached):
>>  [1] bitops_1.0-5   stats4_3.0.0   tools_3.0.0    zlibbioc_1.5.0
>> 
>> 
>> GenomicRanges 1.11.33, will become available via biocLite(0 in the next
>> 24 hours or so.
>> 
>> Cheers,
>> H.
>> 
>> 
>> On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
>>> You could use the low-level constructor
>>> 
>>>     GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>>> 
>>> 'isProperPair' is a logical vector the same length as 'first' and
>>> 'last'.  For example,
>>> 
>>>     GappedAlignmentPairs(left, right, !logical(length(left)))
>>> 
>>> 
>>> Valerie
>>> 
>>> On 02/27/13 07:12, Stefanie Tauber wrote:
>>>> Hi Valerie,
>>>> 
>>>> thanks for the quick answer.
>>>> I  have already checked how the strand is assigned for paired-end data.
>>>> After setting the strand for the left and right parts,
>>>> can I somehow rebuild the gappedalignmentPairs object?
>>>> 
>>>> best,
>>>> Stefanie
>>>> 
>>>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>>>> <mailto:vobencha at fhcrc.org>>:
>>>> 
>>>>> Hi Stephanie,
>>>>> 
>>>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>>>> GappedAlignments.
>>>>> 
>>>>> >  ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>>>> >      galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>>> 
>>>>> You can extract the left/right parts with an accessor then look at
>>>>> strand, cigars, gaps etc.
>>>>> 
>>>>> > left <- left(galp)
>>>>> > class(left)
>>>>> [1] "GappedAlignments"
>>>>> attr(,"package")
>>>>> [1] "GenomicRanges"
>>>>> 
>>>>> > strand(left)
>>>>> factor-Rle of length 1572 with 665 runs
>>>>> Lengths:  3 25  1  1  1  3  1  1  3  2  2 ...  1  6  1  2 3  2 1  5
>>>>> 1 51
>>>>> Values :  -  +  -  +  -  +  -  +  -  +  - ...  +  -  +  - +  - +  -
>>>>> +  -
>>>>> Levels(3): + - *
>>>>> 
>>>>> > head(cigar(left))
>>>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>>> 
>>>>> > head(ngap(left))
>>>>> [1] 0 0 0 0 0 0
>>>>> 
>>>>> It's on these left/right pairs that you can reset the strand.
>>>>> 
>>>>> > strand(left)[1:3]
>>>>> factor-Rle of length 3 with 1 run
>>>>> Lengths: 3
>>>>> Values : +
>>>>> Levels(3): + - *
>>>>> > strand(left)[1:3] <- "-"
>>>>> > strand(left)[1:3]
>>>>> factor-Rle of length 3 with 1 run
>>>>> Lengths: 3
>>>>> Values : -
>>>>> Levels(3): + - *
>>>>> 
>>>>> Please be sure to read the details on the man page regarding how the
>>>>> strands were assigned.
>>>>> 
>>>>> ?GappedAlignmentPairs
>>>>> 
>>>>> 
>>>>> Valerie
>>>>> 
>>>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>>>> Dear list,
>>>>>> 
>>>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>>>> Then I could read in the data with:
>>>>>> 
>>>>>> library(GenomicRanges)
>>>>>> 
>>>>>>> reads = readGappedAlignments("data.bam")
>>>>>> As I typically don't know which strand has been sequenced, it might
>>>>>> be necessary to flip the strand of the reads before doing any kind
>>>>>> of summarizeOverlap:
>>>>>> 
>>>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>>>> of transcript database.
>>>>>> 
>>>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>>>> I read the data:
>>>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>>>> it does not work as above.
>>>>>> I could imagine several work-arounds, I just wonder if there is any
>>>>>> straight approach which I miss at the moment..
>>>>>> 
>>>>>> 
>>>>>> best regards,
>>>>>> Stefanie
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [[alternative HTML version deleted]]
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> 



More information about the Bioconductor mailing list