[BioC] Diffbind: Binding Affinity Heatmap

Fri Aug 16 11:56:10 CEST 2013

Hello Rory

I've been trying to do a differential analysis between my chip-exo 
samples. These, as we discussed before, don't have biological 
replicates. Given my father-mother-child trio, I'd like to assess 
differential peaks using the (child, !child) contrast.

The problem is, as I've found out, that edgeR requires at least two 
replicates, and this is also a requirement for dba.contrast.

I know this is an inherent problem with my dataset, however I was 
wondering if there's anything you'd attempt doing before going back to 
lab people to tell them replicates are needed for differential analysis.

For instance, I was considering the idea of using 
self-pseudo-replicates. Split each bam into 2 bam of equal number reads, 
randomly. Call peaks on each and call these rep1 and rep2. Of course 
this is not representative of real biological variation for that sample. 
Anything else you'd suggest trying?

Alternatively, I have data for another trio (different Hapmap samples. 
but same ethnicity). Could I attempt using these as replicates for trio 
1? The differential sites obtained would be based on more global 
patterns of variability within this population, however.

Thanks
Giuseppe

On 08/15/13 13:56, Rory Stark wrote:
> Hi Giuseppe-
>
>
> Two compare different peak callers on the same replicate, you can get the
> clustering/correlation at the peak level but it doesn't make sense at the
> count level, as all the peaks are merged into a single consensus set at
> that point.
>
> You did this correctly in the first case by including a line for each peak
> caller with the same read (bam) files. At that point you can get a
> correlation heatmap, PCA plot, etc, as well as look at overlaps (e.g. by
> using dba.plotVenn and/or dba.overlap).
>
> One you create a binding matrix, as it done when you run dba.count, you
> are using a single "consensus" set of peaks for all the samples, and
> getting the number of reads in these peaks for each sample. So it no
> longer makes sense to have a different sets of counts for each original
> peakset. This is a result of the peaks being "merged" (by default, all the
> peaks that appear in at least two peaksets are merged into a single set of
> peaks for the rest of the analysis).
>
> If try what you suggest, and use symbolic links, you should get exactly
> the same result for each virtual replicate -- that is, the three entries
> should have correlation values of 1.0, as the same reads are being counted
> within the same (global, merged) consensus peakset.
>
> Cheers-
> Rory
>