[BioC] Rsamtools: Read a bam line by line

Martin Morgan mtmorgan at fhcrc.org
Sat Jul 7 19:55:50 CEST 2012


On 07/07/2012 10:10 AM, Alex Gutteridge wrote:
> Hi,
>
> I have an application where I have mapped reads to two different
> reference sequences and I have these two mappings as bam files. I now
> want to see for each read whether it mapped to one, both or neither of
> the two references (and where the mapping is to both see if one was
> better than the other). All my reading of the samtools docs suggested
> that there is no facility for indexing by read name as opposed to
> genomic coordinates, so the first way that occured to me to do this was
> to sort each bam by read name (which is possible using samtools) and
> then step through the two bams in parallel checking the alignment
> information for each read. There are ~120M reads, so reading the two
> bams into memory all in one go is not practical, but from what I can see
> Rsamtools doesn't have functions for reading bams either line by line or
> by chunks of lines. All the ScanBamParam arguments seem to refer to

see the development version and the yieldSize argument to BamFile. Martin


> reading chunks by genomic ranges - this isn't appropriate for my
> application as the genomic ranges between the two references are not
> necessarily comparable.
>
> So my question is really two-fold:
>
> 1) Am I right in my reading of the Rsamtools docs? Are there any lower
> level functions for reading bams I am ignoring that could do what I want?
> 2) My thought now is to just make sams from the sorted bams and step
> through using readLines() and my own sam parsing code. Is there an
> alternative strategy I have overlooked?
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list