[BioC] About the DiffBind dba.count() crash problems

kentanaka at chiba-u.jp kentanaka at chiba-u.jp
Sat May 25 09:55:00 CEST 2013


Hi, I'm Ken Tanaka. 

I'm currently interested in analyzing the DiffBind analysis by using the 
ChIP-seq data from Th2 immune cell samples. 

To be more specific, I would like to analyze this data (GSE28292) by 
using DiffBind analysis. 

I have questions regarding the dba.count(). 
When I execute the dba.count(), it crashes. 

The bed data which I'm using doesn't include the 6th strand column. 
So, I suppose the crash problem doesn't originate from the problems 
regarding the columns. 

I would like to know how to modify the bed data which the DiffBind can 
read the bed file specifications. 
If you can inform me of these DiffBind bed file specifications which can 
read the bed data, I think I will be able to make the perl script for 
conversions. 
So, could you kindly please let me know of these DiffBind bed file 
specifications which can read the bed data?

I attached below the data and logs which I used for this analysis as 
follows. 

My Best Regards, 
Ken Tanaka

----------------------------------------------------------------
# ChIP-seq bed data files.
GSM773482_Th2_GATA3_Ab.bed.gz
GSM773480_Th2_control_Ab.bed.gz
GSM773484_Th2_WCE.bed.gz     (The 2 bed files listed above are the 
controls.)

GSM773486_Th2_WT_anti_H3K27me3.bed.gz
GSM773490_Th2_WT_anti_H3K9Ac.bed.gz
GSM773492_Th2_WT_anti_H3K4me3.bed.gz
GSM773488_Th2_WT_input.bed.gz (The 3 bed files listed above are the 
controls.)


# macs14 1.4.2 20120305 peak calling output files.
GATA3_Ab_peaks.bed
control_Ab_peaks.bed

H3K27me3_peaks.bed
H3K4me3_peaks.bed
H3K9Ac_peaks.bed


# DiffBind sampleSheet file.
%cat th2diffbind.csv
SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,bamControl,
ControlID,Peaks,PeakCaller,PeakFormat
GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.gz,
databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.bed,macs,raw
control_Ab,control_Ab,Th2,Resistant,Full_Media,1,databed/Th2_control_Ab.
bed.gz,databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/control_Ab_peaks.bed,
macs,raw
H3K27me3,H3K27me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
H3K27me3.bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/
H3K27me3_peaks.bed,macs,raw
H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K4me3.
bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
bed,macs,raw
H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.bed.
gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.bed,
macs,raw




> th2 = dba(sampleSheet="th2diffbind.csv")
GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
control_Ab control_Ab Th2 Resistant Full_Media 1 macs
H3K27me3 H3K27me3 Th2 Responsive Full_Media 1 macs
H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
> 
> #th2
> #str(th2)
> #plot(th2)
> 
> # peaks counting reads
> #th2 = dba.count(th2, bParallel=F)
> th2 = dba.count(th2,minOverlap=3, bParallel=F)
Sample: databed/Th2_GATA3_Ab.bed.gz

 *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
insertLength))
 2: pv.getCounts(job, bed, insertLength, bWithoutDupes = bWithoutDupes)
 3: pv.listadd(results, pv.getCounts(job, bed, insertLength, 
bWithoutDupes = bWithoutDupes))
 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore 
= score,     bLog = bLog, insertLength = insertLength, bOnlyCounts = T,  
   bCalledMasks = bCalledMasks, minMaxval = maxFilter, bParallel = 
bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, 
bScaleControl = bScaleControl)
 5: dba.count(th2, minOverlap = 3, bParallel = F)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1



> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8    
 [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=ja_JP.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DiffBind_1.4.2       Biobase_2.18.0       GenomicRanges_1.10.7
[4] IRanges_1.16.6       BiocGenerics_0.4.0  

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.0-5 amap_0.8-7         edgeR_3.0.8        gdata_2.12.
0      
 [5] gplots_2.11.0      gtools_2.7.0       limma_3.14.4       parallel_2.
15.2   
 [9] stats4_2.15.2      zlibbioc_1.4.0    
> 
------------------------------------------------------------------------
---------

--------------------------------------
Ken Tanaka
MD-PhD Candidate
Chiba University Medical School



More information about the Bioconductor mailing list