[BioC] CIGAR-aware coverage

Hervé Pagès hpages at fhcrc.org
Fri Feb 28 08:32:25 CET 2014



On 02/27/2014 11:29 PM, Hervé Pagès wrote:
> Hi Chris,
>
> On 02/27/2014 09:42 PM, Michael Lawrence wrote:
>> On Thu, Feb 27, 2014 at 4:37 PM, chris warth <cswarth at gmail.com> wrote:
>>
>>> I would like to gather coverage data for gapped alignments.  I am
>>> using readGAlignmentsFromBam() to read a GAlignments object,
>>> converting to an IRanges object and calling IRanges::coverage() , but
>>> that ignores CIGAR strings.
>>>
>>
>> There appears to be a message to BIOC-devel on CIGAR-aware coverage
>>> ( https://www.mail-archive.com/bioc-devel@r-project.org/msg01460.html )
>>> that suggests using the 'GenomicAlignments' package.
>>>
>>>
>> GenomicAlignments is mostly a reorganization of pre-existing
>> functionality.
>> You should be able to call coverage() directly on your GAlignments, or on
>> the BamFile itself, with full support for cigar strings. Also, it is
>> sufficient to call readGAlignments(), i.e., no need for "FromBam".
>>
>>
>>> As far as I can tell that is only available under 2.14, the
>>> development version of bioconductor,  which would also require running
>>> R-devel.
>>>
>>> Is there any way to get CIGAR-aware coverage under bioconductor 2.13?
>>>
>>> Thanks in advance,
>>>
>>> Chris Warth
>>> Fred Hutchinson CRC
>>>
>>> P.S. Not to distract from the above question, but I would also like
>>> CIGAR-aware retrieval of alignments.   readGAlignmentsFromBam()
>>> retrieves alignments whose gap completely spans the range I am
>>> interested in.   I haven't found a way to exclude alignments that
>>> don't have any actual base alignments overlapping the range I asked
>>> for.
>
> You cannot exclude alignments whose gap completely spans the range
> you specify in the 'which' arg of the ScanBamParam object you pass
> to readGAlignmentsFromBam(). However you can get rid of them right
> after readGAlignmentsFromBam() returns with something like:
>
>    my_ROI <- GRanges("chr1, IRanges(1964, 2014))  # my region of interest
>    gal <- readGAlignmentsFromBam(bamfile, ScanBamParam(which=my_ROI))
>    subsetByOverlaps(gal, my_ROI)

Trying this again:

   my_ROI <- GRanges("chr1", IRanges(1964, 2014))  # my region of interest
   gal <- readGAlignmentsFromBam(bamfile, param=ScanBamParam(which=my_ROI))
   subsetByOverlaps(gal, my_ROI)

Hope that works for you.

H.

>
> This subsetting will only keep those alignments that have at least 1
> aligned nucleotide that falls within 'my_ROI'.
>
> Cheers,
> H.
>
>>>
>>> ====
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>>   [1] wavelets_0.3-0       vegan_2.0-10         lattice_0.20-24
>>>   [4] permute_0.8-3        rtracklayer_1.21.14  Rsamtools_1.13.52
>>>   [7] Biostrings_2.29.19   BiocInstaller_1.12.0 GenomicRanges_1.14.4
>>> [10] XVector_0.1.4        IRanges_1.20.6       BiocGenerics_0.7.8
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BSgenome_1.29.1 RCurl_1.95-4.1  XML_3.98-1.1    bitops_1.0-6
>>> [5] compiler_3.0.2  grid_3.0.2      stats4_3.0.2    tools_3.0.2
>>> [9] zlibbioc_1.7.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list