[BioC] diffbind, paired end reads

Tue Jun 18 10:51:40 CEST 2013

Hi, Rory, Kasper,

DiffBind's original counting code treats paired-end data the same as
single-end, i.e. each read is considered separately, with no attention
paid to whether it's part of a pair.  As long as all the libraries are the
same (all S.E. or all P.E.) it shouldn't affect the outcome; counts will
just be double for P.E. cases (disregarding improperly paired reads).

To fix it properly, since we accept bed as well as bam, we'd have to test
for properly paired reads ourselves, possibly doing it just that bit
differently from other tools.  Personally, I'd rather not open that box...

Cheers,

 - Gord

On 2013-06-17 23:55, "Rory Stark" <Rory.Stark at cruk.cam.ac.uk> wrote:

>Hi Kasper-
> 
>From 1.6 on, if the bLowMem parameter is set to TRUE in dba.count,
>DiffBind will use summarizeOverlaps. Changing the config value
>DBA$config$singleEnd to FALSE allows it to use paired-end BAM files. The
>documentation for summarizeOverlaps in the GenomicRanges
> package explains how it handles paired-end data.
> 
>Gord, how does the default count code handle paired-end data (in pre-1.6
>DiffBind, or when bLowMem=FALSE)?
> 
>Cheers-
>Rory
> 
>________________________________________
>From: Kasper Daniel Hansen [kasperdanielhansen at gmail.com]
>Sent: 17 June 2013 21:51
>To: Rory Stark
>Subject: diffbind, paired end reads
>
>
>Hi Rory
>
>
>Just skimming DiffBind.  I was surprised there is not a longer discussion
>of dba.count() in the vignette regarding supported input formats.  I also
>read (skimmed) the man pages.  It is not clear to me if DiffBind supports
>paired end reads in the counting
> step (and also how it deals with for example a paired end read where
>only one mate aligns).
>
>
>Hope all is well in Cambridge.
>
>
>Best,
>Kasper