[BioC] CIGAR-aware coverage

Hervé Pagès hpages at fhcrc.org
Fri Feb 28 08:29:38 CET 2014


Hi Chris,

On 02/27/2014 09:42 PM, Michael Lawrence wrote:
> On Thu, Feb 27, 2014 at 4:37 PM, chris warth <cswarth at gmail.com> wrote:
>
>> I would like to gather coverage data for gapped alignments.  I am
>> using readGAlignmentsFromBam() to read a GAlignments object,
>> converting to an IRanges object and calling IRanges::coverage() , but
>> that ignores CIGAR strings.
>>
>
> There appears to be a message to BIOC-devel on CIGAR-aware coverage
>> ( https://www.mail-archive.com/bioc-devel@r-project.org/msg01460.html )
>> that suggests using the 'GenomicAlignments' package.
>>
>>
> GenomicAlignments is mostly a reorganization of pre-existing functionality.
> You should be able to call coverage() directly on your GAlignments, or on
> the BamFile itself, with full support for cigar strings. Also, it is
> sufficient to call readGAlignments(), i.e., no need for "FromBam".
>
>
>> As far as I can tell that is only available under 2.14, the
>> development version of bioconductor,  which would also require running
>> R-devel.
>>
>> Is there any way to get CIGAR-aware coverage under bioconductor 2.13?
>>
>> Thanks in advance,
>>
>> Chris Warth
>> Fred Hutchinson CRC
>>
>> P.S. Not to distract from the above question, but I would also like
>> CIGAR-aware retrieval of alignments.   readGAlignmentsFromBam()
>> retrieves alignments whose gap completely spans the range I am
>> interested in.   I haven't found a way to exclude alignments that
>> don't have any actual base alignments overlapping the range I asked
>> for.

You cannot exclude alignments whose gap completely spans the range
you specify in the 'which' arg of the ScanBamParam object you pass
to readGAlignmentsFromBam(). However you can get rid of them right
after readGAlignmentsFromBam() returns with something like:

   my_ROI <- GRanges("chr1, IRanges(1964, 2014))  # my region of interest
   gal <- readGAlignmentsFromBam(bamfile, ScanBamParam(which=my_ROI))
   subsetByOverlaps(gal, my_ROI)

This subsetting will only keep those alignments that have at least 1
aligned nucleotide that falls within 'my_ROI'.

Cheers,
H.

>>
>> ====
>>> sessionInfo()
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>   [1] wavelets_0.3-0       vegan_2.0-10         lattice_0.20-24
>>   [4] permute_0.8-3        rtracklayer_1.21.14  Rsamtools_1.13.52
>>   [7] Biostrings_2.29.19   BiocInstaller_1.12.0 GenomicRanges_1.14.4
>> [10] XVector_0.1.4        IRanges_1.20.6       BiocGenerics_0.7.8
>>
>> loaded via a namespace (and not attached):
>> [1] BSgenome_1.29.1 RCurl_1.95-4.1  XML_3.98-1.1    bitops_1.0-6
>> [5] compiler_3.0.2  grid_3.0.2      stats4_3.0.2    tools_3.0.2
>> [9] zlibbioc_1.7.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list