[BioC] Rsamtools: Read a bam line by line

Alex Gutteridge alexg at ruggedtextile.com
Sat Jul 7 10:10:58 CEST 2012


Hi,

I have an application where I have mapped reads to two different 
reference sequences and I have these two mappings as bam files. I now 
want to see for each read whether it mapped to one, both or neither of 
the two references (and where the mapping is to both see if one was 
better than the other). All my reading of the samtools docs suggested 
that there is no facility for indexing by read name as opposed to 
genomic coordinates, so the first way that occured to me to do this was 
to sort each bam by read name (which is possible using samtools) and 
then step through the two bams in parallel checking the alignment 
information for each read. There are ~120M reads, so reading the two 
bams into memory all in one go is not practical, but from what I can see 
Rsamtools doesn't have functions for reading bams either line by line or 
by chunks of lines. All the ScanBamParam arguments seem to refer to 
reading chunks by genomic ranges - this isn't appropriate for my 
application as the genomic ranges between the two references are not 
necessarily comparable.

So my question is really two-fold:

   1) Am I right in my reading of the Rsamtools docs? Are there any 
lower level functions for reading bams I am ignoring that could do what 
I want?
   2) My thought now is to just make sams from the sorted bams and step 
through using readLines() and my own sam parsing code. Is there an 
alternative strategy I have overlooked?

-- 
Alex Gutteridge



More information about the Bioconductor mailing list