[BioC] Calculate Outer Positions of GappedAlignmentPairs

Hervé Pagès hpages at fhcrc.org
Wed Mar 20 20:34:32 CET 2013



On 03/20/2013 10:05 AM, Valerie Obenchain wrote:
> Hello,
>
> There are a couple of existing methods of GappedAlignmentPairs that may
> be helpful.
>
> Using 'galp' from the man page as an example,
>
> example(GappedAlignmentPairs)
>  > class(galp)
> [1] "GappedAlignmentPairs"
> attr(,"package")
> [1] "GenomicRanges"
>  > length(galp)
> [1] 1572
>
> To get your furthest outer positions, pull out the start of the
> left-most pair and the end of the right-most pair:
>
> outer <- cbind(start(left(galp)), end(right(galp)))
>  > head(outer)
>       [,1] [,2]
> [1,]   36  219
> [2,]   49  259
> [3,]   51  262
> [4,]   60  259
> [5,]   60  269
> [6,]   61  282
>
> For coverage, the left and right pair can be extracted and coerced to a
> GRAnges which retains introns in each pair as part of the range.
>
> grleft <- granges(left(galp))
> grright <- granges(right(galp))
>
> If we implemented a granges,GappedAlignmentPairs it should return
> something of the same length. To do this we'd have to merge the pair
> ranges which would then include the space between the ranges which I
> don't think we want. Right?

This is a good point. The new "granges" method for GappedAlignmentPairs
doesn't make any distinction between gaps (i.e. N's in the CIGAR's) and
the space between the first and last alignments of a pair. All those
gaps/spaces are "filled" to produce a single range per pair.

If this is what the user wants to do, then fine, now s/he has an easy
way to do it (with granges()). Whether it makes sense or not to call
coverage() on that thing is another story and entirely up to the user.

However note that this is NOT what Samtool's pileup (or whatever it's
called those days) does. Last time I checked (was a long time ago,
could have changed), you would need to do coverage(grglist(x)) on
GappedAlignments object 'x' to obtain something that is consistent
with Samtool's pileup. That means that gaps (i.e. N's in the CIGAR's)
do NOT generate coverage (but D's do!). I'm pretty confident that
the space between the first and last alignments of a pair doesn't
generate coverage either.

But that doesn't mean there are no situations where doing
coverage(granges(x)) on GappedAlignmentPairs object 'x' doesn't make
sense. Maybe there are. It's just not the usual way of doing things
and one needs to be aware of it and the implications that this will
have on the downstream analysis.

Cheers,
H.

>
> As an fyi, another potentially useful function is introns(). This
> returns a GRangesList of all intron ranges for each pair.
>
> introns <- introns(galp)
>
>
> Valerie
>
> On 03/19/2013 09:44 PM, Michael Lawrence wrote:
>> It would make sense for this to be granges,GappedAlignmentPairs. Want to
>> contribute it?
>>
>> Michael
>>
>>
>> On Tue, Mar 19, 2013 at 5:59 PM, Dario Strbenac
>> <d.strbenac at garvan.org.au>wrote:
>>
>>> Hello,
>>>
>>> I would like to calculate the furthest outer positions of the mapped
>>> read
>>> pairs. That is, the lowest and highest mapped coordinate of a read. In a
>>> later step, I want to get a coverage estimate that also includes
>>> coverage
>>> of the spliced out intron parts. I couldn't find any function already in
>>> GenomicRanges. I made my own code to create a GRanges object of these
>>> simple ranges, but was curious about any API functionality I overlooked
>>> that could do this.
>>>
>>> Thanks,
>>>             Dario.
>>>
>>> --------------------------------------
>>> Dario Strbenac
>>> PhD Student
>>> University of Sydney
>>> Camperdown NSW 2050
>>> Australia
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list