[BioC] deqseq_count and BWA-based SAM files

Simon Anders anders at embl.de
Tue Dec 13 23:12:14 CET 2011


Hi Wyatt

On 2011-12-13 22:06, Wyatt McMahon wrote:
> Unfortunately, none of these has worked.  I've used both Shan's
> script as well as samtools and am still having the same problem.
> Despite everything being very nicely sorted, I still getting the same
> error message.

Do you get the error for every read or only for some. The latter is 
typically harmless.

To explain: The way how the SAM format stores paired-end reads is, IMO, 
rather unfortunate. Each mate gets its own SAM line, and the two SAM 
lines can be at rather different places in the file. Once you sort by 
name, the mates will be close to each other (even though they may still 
be mixed up in case there is more than one alignment for the pair). 
HTSeq takes a chunk of adjacent lines with the same read ID and arranges 
them into matching pairs (by using the MRNM and MPOS (or RNEXT and PNEXT 
in the new terminology) columns). If this does not work, the warning is 
displayed.

Often, if you do some filtering, you might remove a SAM line for a read 
but leave in the line for its mate. HTSeq will simply skip such reads 
but display the warning you saw. You can silence the warnings (but also 
all others) with the '-q' option if they bother you.

   Simon



More information about the Bioconductor mailing list