[BioC] flip strand of GappedAlignmentPairs Object
Stefanie Tauber
stefanie.tauber at univie.ac.at
Thu Feb 28 10:13:27 CET 2013
Thanks a lot to both of you!
Best,
Stefanie
Am 27.02.2013 um 20:47 schrieb Valerie Obenchain <vobencha at fhcrc.org>:
> Thanks Herve.
>
> On 02/27/13 11:23, Hervé Pagès wrote:
>> Hi Stefanie, Val,
>>
>> Makes sense to have a strand setter for GappedAlignmentPairs
>> objects, like we have for GappedAlignments objects. I've added
>> it to GenomicRanges devel:
>>
>> > galp
>> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>> seqnames strand : ranges -- ranges
>> <Rle> <Rle> : <IRanges> -- <IRanges>
>> EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537]
>> EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239]
>> EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563]
>> EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323]
>> EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95]
>> ---
>> seqlengths:
>> chr1 chr2
>> 1575 1584
>>
>> > strand(first(galp))
>> factor-Rle of length 5 with 3 runs
>> Lengths: 2 2 1
>> Values : - + -
>> Levels(3): + - *
>>
>> > strand(last(galp))
>> factor-Rle of length 5 with 3 runs
>> Lengths: 2 2 1
>> Values : + - +
>> Levels(3): + - *
>>
>> Flip the strand:
>>
>> > strand(galp) <- ifelse(strand(galp) == "+", "-", "+")
>> > galp
>> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns:
>> seqnames strand : ranges -- ranges
>> <Rle> <Rle> : <IRanges> -- <IRanges>
>> EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537]
>> EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239]
>> EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563]
>> EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323]
>> EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95]
>> ---
>> seqlengths:
>> chr1 chr2
>> 1575 1584
>>
>> > strand(first(galp))
>> factor-Rle of length 5 with 3 runs
>> Lengths: 2 2 1
>> Values : + - +
>> Levels(3): + - *
>>
>> > strand(last(galp))
>> factor-Rle of length 5 with 3 runs
>> Lengths: 2 2 1
>> Values : - + -
>> Levels(3): + - *
>>
>> Note that a slightly more efficient way to compute the flipped strand
>> is:
>>
>> runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+",
>> "-", "+")
>>
>> This will only make a difference on big GappedAlignmentPairs objects of
>> course.
>>
>> > sessionInfo()
>> R Under development (unstable) (2013-02-19 r62008)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33
>> [4] IRanges_1.17.35 BiocGenerics_0.5.6
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0
>>
>>
>> GenomicRanges 1.11.33, will become available via biocLite(0 in the next
>> 24 hours or so.
>>
>> Cheers,
>> H.
>>
>>
>> On 02/27/2013 07:48 AM, Valerie Obenchain wrote:
>>> You could use the low-level constructor
>>>
>>> GappedAlignmentPairs(first, last, isProperPair, names=NULL)
>>>
>>> 'isProperPair' is a logical vector the same length as 'first' and
>>> 'last'. For example,
>>>
>>> GappedAlignmentPairs(left, right, !logical(length(left)))
>>>
>>>
>>> Valerie
>>>
>>> On 02/27/13 07:12, Stefanie Tauber wrote:
>>>> Hi Valerie,
>>>>
>>>> thanks for the quick answer.
>>>> I have already checked how the strand is assigned for paired-end data.
>>>> After setting the strand for the left and right parts,
>>>> can I somehow rebuild the gappedalignmentPairs object?
>>>>
>>>> best,
>>>> Stefanie
>>>>
>>>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at fhcrc.org
>>>> <mailto:vobencha at fhcrc.org>>:
>>>>
>>>>> Hi Stephanie,
>>>>>
>>>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right'
>>>>> GappedAlignments.
>>>>>
>>>>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools")
>>>>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE)
>>>>>
>>>>> You can extract the left/right parts with an accessor then look at
>>>>> strand, cigars, gaps etc.
>>>>>
>>>>> > left <- left(galp)
>>>>> > class(left)
>>>>> [1] "GappedAlignments"
>>>>> attr(,"package")
>>>>> [1] "GenomicRanges"
>>>>>
>>>>> > strand(left)
>>>>> factor-Rle of length 1572 with 665 runs
>>>>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5
>>>>> 1 51
>>>>> Values : - + - + - + - + - + - ... + - + - + - + -
>>>>> + -
>>>>> Levels(3): + - *
>>>>>
>>>>> > head(cigar(left))
>>>>> [1] "35M" "36M" "35M" "36M" "35M" "35M"
>>>>>
>>>>> > head(ngap(left))
>>>>> [1] 0 0 0 0 0 0
>>>>>
>>>>> It's on these left/right pairs that you can reset the strand.
>>>>>
>>>>> > strand(left)[1:3]
>>>>> factor-Rle of length 3 with 1 run
>>>>> Lengths: 3
>>>>> Values : +
>>>>> Levels(3): + - *
>>>>> > strand(left)[1:3] <- "-"
>>>>> > strand(left)[1:3]
>>>>> factor-Rle of length 3 with 1 run
>>>>> Lengths: 3
>>>>> Values : -
>>>>> Levels(3): + - *
>>>>>
>>>>> Please be sure to read the details on the man page regarding how the
>>>>> strands were assigned.
>>>>>
>>>>> ?GappedAlignmentPairs
>>>>>
>>>>>
>>>>> Valerie
>>>>>
>>>>> On 02/27/13 05:55, Stefanie Tauber wrote:
>>>>>> Dear list,
>>>>>>
>>>>>> Let's say I have single-end strand-specific RNA-Seq data.
>>>>>> Then I could read in the data with:
>>>>>>
>>>>>> library(GenomicRanges)
>>>>>>
>>>>>>> reads = readGappedAlignments("data.bam")
>>>>>> As I typically don't know which strand has been sequenced, it might
>>>>>> be necessary to flip the strand of the reads before doing any kind
>>>>>> of summarizeOverlap:
>>>>>>
>>>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+")
>>>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind
>>>>>> of transcript database.
>>>>>>
>>>>>> Now, if I have paired-end strand-specific RNA-Seq data.
>>>>>> I read the data:
>>>>>>> reads = readGappedAlignmentPairs("data.bam")
>>>>>> How can I flip the strand of the reads? As it's now paired-end data,
>>>>>> it does not work as above.
>>>>>> I could imagine several work-arounds, I just wonder if there is any
>>>>>> straight approach which I miss at the moment..
>>>>>>
>>>>>>
>>>>>> best regards,
>>>>>> Stefanie
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
More information about the Bioconductor
mailing list