[BioC] DiffBind - overlap between different peak callers

Paolo Kunderfranco paolo.kunderfranco at gmail.com
Tue Aug 7 12:27:16 CEST 2012


Dear All,
I am using package DiffBind,
Due to the fact that I miss real biological or technical replicates, I
am assesing the overlapping rates between different peak callers.
Here is how I set up my database to read in:

3 Cell lines, 4 Factor, 2 Replicates, for a total of 12 Conditions

> rm(list=ls())
> setwd("/home/pkunderf/output/DiffBind/ES-CMN-CMA/")
> test=dba(sampleSheet="database.csv")
> test
24 Samples, 19380 sites in matrix (40809 total):
                  ID Tissue Factor    Condition Replicate Intervals
1  mES_H3K27me3_m     ES  H3K27 mES_H3K27me3         1      1834
2  CMp_H3K27me3_m    CMN  H3K27 CMp_H3K27me3         1      2450
3  CMa_H3K27me3_m    CMA  H3K27 CMa_H3K27me3         1       990
4  mES_H3K27me3_s     ES  H3K27 mES_H3K27me3         2      2188
5  CMp_H3K27me3_s    CMN  H3K27 CMp_H3K27me3         2      3388
6  CMa_H3K27me3_s    CMA  H3K27 CMa_H3K27me3         2      5946
7   mES_H3K4me3_m     ES   H3K4  mES_H3K4me3         1     10243
8   CMp_H3K4me3_m    CMN   H3K4  CMp_H3K4me3         1     12476
9   CMa_H3K4me3_m    CMA   H3K4  CMa_H3K4me3         1      5632
10   mES_H3K4m3_s     ES   H3K4  mES_H3K4me3         2     14917
11  CMp_H3K4me3_s    CMN   H3K4  CMp_H3K4me3         2     10646
12  CMa_H3K4me3_s    CMA   H3K4  CMa_H3K4me3         2      5985
13  mES_H3K9me3_m     ES   H3K9  mES_H3K9me3         1       110
14  CMp_H3K9me3_m    CMN   H3K9  CMp_H3K9me3         1       484
15  CMa_H3K9me3_m    CMA   H3K9  CMa_H3K9me3         1       938
16  mES_H3K9me3_s     ES   H3K9  mES_H3K9me3         2       569
17  CMp_H3K9me3_s    CMN   H3K9  CMp_H3K9me3         2       808
18  CMa_H3K9me3_s    CMA   H3K9  CMa_H3K9me3         2      3747
19 mES_H3K79me2_m     ES  H3K79 mES_H3K79me2         1     21318
20 CMp_H3K79me2_m    CMN  H3K79 CMp_H3K79me2         1     13210
21 CMa_H3K79me2_m    CMA  H3K79 CMa_H3K79me2         1      2988
22 mES_H3K79me2_s     ES  H3K79 mES_H3K79me2         2     22164
23 CMp_H3K79me2_s    CMN  H3K79 CMp_H3K79me2         2     13429
24 CMa_H3K79me2_s    CMA  H3K79 CMa_H3K79me2         2      6063


> test=dba.count(test)

> test=dba.contrast(test, categories=DBA_CONDITION,minMembers=2)
#minMemebers was se to 2 due to
the fact I have 2 replicates for each condition
#....78 Contrasts:

> test = dba.analyze(test)

#for each condition generate a report
>test.DB1=dba.report(test,contrast=1)


I would like now to assess overlapping rates between replicates, and
generate a venn diagram.
but it seems like replicates are merged together and I cannot treat
them as single samples.
I noticed this behaviour also when i try to generate both a PCA plot
and a heatmap plot, replicates are merged, so
what's the utility to display both in the PCA or heatmap plot?
would it be simple to display the original replicates to assess how do
they cluster?

>pdf('PCA_plot_DBA_CONDITION.pdf')
>dba.plotPCA(test,attributes=DBA_CONDITION, vColors=c("grey0","grey23","grey47","antiquewhite4","antiquewhite3","antiquewhite1","dodgerblue4","dodgerblue","darkslategray1","brown4","brown","coral"))
>dev.off()

>pdf('HeatMap.pdf')
>corvals=dba.plotHeatmap(test)
>dev.off()

>pdf('HeatMap_RPKM_FOLD.pdf')
>corvals=dba.plotHeatmap(test,score=DBA_SCORE_RPKM_FOLD)
>dev.off()

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] DiffBind_1.2.0      Biobase_2.16.0      GenomicRanges_1.8.9
[4] IRanges_1.14.4      BiocGenerics_0.2.0

loaded via a namespace (and not attached):
 [1] amap_0.8-7         edgeR_2.6.10       gdata_2.11.0       gplots_2.11.0
 [5] gtools_2.7.0       limma_3.12.1       RColorBrewer_1.0-5 stats4_2.15.1
 [9] tools_2.15.1       zlibbioc_1.2.0



Any sugestion is appreciated,
cheers,
paolo



More information about the Bioconductor mailing list