[BioC] Diffbind: Binding Affinity Heatmap

Wed Aug 14 15:38:41 CEST 2013

Hi Giuseppe-

The standard heatmap plots the read densities of the differentially bound
sites. The x-axis clusters the samples, and the y-axis clusters the sites
based on the score for each site in each sample. The major clusters should
show similar patterns of binding levels.

In the example plot you sent, there are roughly three main clusters of
differentially bound sites. The bottom cluster has higher binding rates in
the first (red) sample group (group one gain/group two loss). The middle
cluster includes sites with higher binding rates in the second sample
group (group one loss/group two gain). The top cluster includes sites with
substantial binding in both groups, but that nonetheless exhibit a
significant change in binding intensity at these sites; it looks in
general like these go from moderate binding in the first group to very
strong binding in the second group (especially in the sample cluster on
the far right).

You can get a bit more contrast in these plots by using the "maxval"
parameter to clip the highly-boud sites (the long tail to the right in the
Color Key). For example, in this case setting maxval=6 could probably give
a clearer picture of what patterns are driving the clustering of binding
sites.

Cheers-
Rory

On 14/08/2013 11:37, "Giuseppe Gallone" <giuseppe.gallone at dpag.ox.ac.uk>
wrote:

>Hi Rory
>
>I have a further question about DiffBind. Could you tell me something
>more about the clustering visualisation obtained with
>dba.plotHeatmap(....correlations=FALSE)? I've carried out a differential
>analysis on my samples and I observe some interesting clustering using
>both EDGER and DESEQ. I then plotted the heatmap using correlation=FALSE.
>
>I understand that the clustering obtained with dba.visualise is
>reproduced  on the x axis (columns are grouped by clustering).
>
>What is shown instead on the y axis? Are these the individual
>differentially bound sites across the genome? What is the clustering
>described on the left?
>
>Thanks once again.
>
>Best
>Giuseppe
>
>On 07/23/13 18:22, Rory Stark wrote:
>> Hi Giuseppe-
>>
>> I'm glad to sorted the column thing out, that was what I suspected.
>>
>> There shouldn't be much problem doing the analysis without a control
>> track, particularly if the samples come from the same tissue. The main
>> role of the control tracks is for peak calling. The reason the control
>> track is less important for differential analysis is that youy are
>>looking
>> at the relative differences in read density at the same genomic
>>intervals
>> across samples, and not comparing read densities across intervals. So if
>> the control track were similar at that location for all samples, it will
>> not affect the differential analysis. The main issue would be if there
>> were something like big copy number differences between samples. Then
>>you
>> could get sites that show as differentially bound when the real
>>difference
>> was the copy number. But the difference would be real regardless.
>>
>> Regarding sequencing depth, this should be taken care of by the
>> normalisation step. It takes the library size (either full library size,
>> which is the total number of reads, or the default effective library
>>size,
>> the number of reads within peaks for each sample) and adjusts the read
>> counts. You can can an idea of how this is working by using the
>> dba.plotBox (with bAll=TRUE) comparing bNormalized=TRUE and
>> bNormalized=FALSE to see if things even out. Also, after counting, you
>>can
>> look at the clustering (dba.plotPCA and dba.plotHeatmap) to see if
>>samples
>> are grouping by sequencing depth -- try doing the same plots with
>> different score, eg score=DBA_SCORE_READS, score=DBA_SCORE_RPKM, and
>> score=DBA_SCORE_TMM_READS_EFFECTIVE or score=DBA_SCORE_TMM_READS_FULL to
>> see which gives to the best clustering.
>>
>> Hope this helps!
>>
>> Cheers-
>> Rory