[BioC] About the DiffBind dba.count() crash problems

Gordon Brown Gordon.Brown at cruk.cam.ac.uk
Fri Jun 7 11:10:29 CEST 2013


Dear Ken,

Thanks for the bug report.  Can you share your bed files with me (maybe
just the first 100 lines or so of each) so I can try to reproduce this
bug?  It must be a different one from the previous bed reading bug.

Thanks,

 - Gord


On 2013-06-06 23:50, "kentanaka at chiba-u.jp" <kentanaka at chiba-u.jp> wrote:

>Dear Dr. Brown, 
>
>I asked the questions earlier regarding the DiffBind last week.
>
>I have versioned up to R3.0.1 and Bioconductor 2.12, and I checked to
>see how DiffBind 1.6.2 works. But the dba.count() still crashes in the
>bed data and I attached the logs of the bed data below.
>
>Since the versioned up DiffBind 1.6.2 output the GEO DATA as an
>Alignment Data Sample, I downloaded the data and I checked to see what
>that is. And I found that the BWA bam files were created.
>
>I converted the fastq data on Th2 immune cells to bam files and I
>checked to see how it works. And finally, I could obtain the dba.report
>(). 
>
>Thank you very much for all your advice and I really appreciate for your
>kindness. 
>
>My Best Regards, 
>Ken Tanaka
>
>------------------------------------------------------------------------
>------------------------
>> sessionInfo()
>R version 3.0.1 (2013-05-16)
>Platform: x86_64-suse-linux-gnu (64-bit)
>
>locale:
>[1] C
>
>attached base packages:
>[1] parallel  stats     graphics  grDevices utils     datasets  methods
>[8] base     
>
>other attached packages:
>[1] DiffBind_1.6.2       Biobase_2.20.0       GenomicRanges_1.12.4
>[4] IRanges_1.18.1       BiocGenerics_0.6.0
>
>loaded via a namespace (and not attached):
>[1] RColorBrewer_1.0-5 amap_0.8-7         edgeR_3.2.3        gdata_2.12.
>0.2    
>[5] gplots_2.11.0.1    gtools_2.7.1       limma_3.16.5       stats4_3.0.
>1      
>[9] zlibbioc_1.6.0
>
>
>> th2 = dba(sampleSheet="th2diffbind.csv")
>GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
>H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
>H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
>
>
>> th2
>3 Samples, 16806 sites in matrix (35676 total):
>        ID   Tissue Factor  Condition  Treatment Replicate Peak.caller
>1 GATA3_Ab GATA3_Ab    Th2  Resistant Full_Media         1        macs
>2  H3K4me3  H3K4me3    Th2 Responsive Full_Media         1        macs
>3   H3K9Ac   H3K9Ac    Th2 Responsive Full_Media         1        macs
>  Intervals
>1       464
>2     25875
>3     32569
>
>
>> th2 = dba.count(th2,minOverlap=3, bParallel=F)
>Sample: databed/Th2_GATA3_Ab.bed.gz
>
> *** caught segfault ***
>address (nil), cause 'memory not mapped'
>
>Traceback:
> 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
>insertLength))
> 2: pv.getCounts(job, bed, insertLength, bWithoutDupes = bWithoutDupes,
>   bLowMem, yieldSize, mode, singleEnd, scanbamparam)
> 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
>bWithoutDupes = bWithoutDupes,     bLowMem, yieldSize, mode, singleEnd,
>scanbamparam))
> 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore
>= score,     bLog = bLog, insertLength = insertLength, bOnlyCounts = T,
>   bCalledMasks = bCalledMasks, minMaxval = filter, bParallel =
>bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
>bScaleControl = bScaleControl,     filterFun = filterFun, bLowMem =
>bLowMem)
> 5: dba.count(th2, minOverlap = 3, bParallel = F)
>
>Possible actions:
>1: abort (with core dump, if enabled)
>2: normal R exit
>3: exit R without saving workspace
>4: exit R saving workspace
>Selection: 
>
>------------------------------------------------------------------------
>------------------------
>
># cat th2diffbind.csv
>SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,bamControl,
>ControlID,Peaks,PeakCaller,PeakFormat
>GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.gz,
>databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.xls,macs,
>macs
>H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K4me3.
>bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
>xls,macs,macs
>H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.bed.
>gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.xls,
>macs,macs
>
>
>#zmore Th2_GATA3_Ab.bed.gz |head
>------> Th2_GATA3_Ab.bed.gz <------
>chrY    28285   28404   Th2_GATA3_Ab    1
>chrY    28363   28482   Th2_GATA3_Ab    1
>chrY    28466   28585   Th2_GATA3_Ab    1
>chrY    30469   30588   Th2_GATA3_Ab    1
>chrY    99937   100056  Th2_GATA3_Ab    1
>chrY    114597  114716  Th2_GATA3_Ab    1
>chrY    269098  269217  Th2_GATA3_Ab    1
>chrY    269773  269892  Th2_GATA3_Ab    1
>chrY    410005  410124  Th2_GATA3_Ab    1
>
>
># head -30 GATA3_Ab_peaks.xls
># This file is generated by MACS version 1.4.2 20120305
># ARGUMENTS LIST:
># name = GATA3_Ab
># format = BED
># ChIP-seq file = ../databed/Tst_GATA3_Ab.bed
># control file = ../databed/Tst_WCE.bed
># effective genome size = 1.87e+09
># band width = 300
># model fold = 10,30
># pvalue cutoff = 1.00e-05
># Large dataset will be scaled towards smaller dataset.
># Range for calculating regional lambda is: 1000 bps and 10000 bps
># tag size is determined as 119 bps
># total tags in treatment: 353649
># tags after filtering in treatment: 353649
># maximum duplicate tags at the same position in treatment = 1
># Redundant rate in treatment: 0.00
># total tags in control: 481971
># tags after filtering in control: 481971
># maximum duplicate tags at the same position in control = 1
># Redundant rate in control: 0.00
># d = 200
>chr     start   end     length  summit  tags    -10*log10(pvalue)
>fold_enrichment FDR(%)
>chr1    3704028 3704311 284     142     4       53.27   19.47   13.74
>chr1    3787290 3787781 492     299     4       69.45   68.14   10.89
>chr1    5694647 5695942 1296    189     11      104.10  25.55   8.33
>chr1    5794055 5794296 242     121     4       85.91   38.94   4.17
>chr1    6125574 6126093 520     337     7       56.83   16.22   13.71
>chr1    6444066 6445227 1162    558     19      63.10   10.22   14.29
>
>------------------------------
>------------------------------------------------------------------
>
>----- Original Message -----
>> Hi, Ken,
>> 
>> This bug is fixed in the current version of DiffBind, so if you
>upgrade,
>> it should go away.  If upgrading is not feasible, ensure that the bed
>> files all have 6 columns to work around the bug.
>> 
>> Cheers,
>> 
>>  - Gord
>> 
>> 
>> >Message: 23
>> >Date: Sat, 25 May 2013 16:55:00 +0900
>> >From: <kentanaka at chiba-u.jp>
>> >To: <bioconductor at r-project.org>
>> >Subject: [BioC] About the DiffBind dba.count() crash problems
>> >Message-ID: <20130525075500.0000701E.0473 at chiba-u.jp>
>> >Content-Type: text/plain; charset=UTF-8
>> >
>> >Hi, I'm Ken Tanaka.
>> >
>> >I'm currently interested in analyzing the DiffBind analysis by using
>the
>> >ChIP-seq data from Th2 immune cell samples.
>> >
>> >To be more specific, I would like to analyze this data (GSE28292) by
>> >using DiffBind analysis.
>> >
>> >I have questions regarding the dba.count().
>> >When I execute the dba.count(), it crashes.
>> >
>> >The bed data which I'm using doesn't include the 6th strand column.
>> >So, I suppose the crash problem doesn't originate from the problems
>> >regarding the columns.
>> >
>> >I would like to know how to modify the bed data which the DiffBind
>can
>> >read the bed file specifications.
>> >If you can inform me of these DiffBind bed file specifications which
>can
>> >read the bed data, I think I will be able to make the perl script for
>> >conversions. 
>> >So, could you kindly please let me know of these DiffBind bed file
>> >specifications which can read the bed data?
>> >
>> >I attached below the data and logs which I used for this analysis as
>> >follows. 
>> >
>> >My Best Regards,
>> >Ken Tanaka
>> >
>> >----------------------------------------------------------------
>> ># ChIP-seq bed data files.
>> >GSM773482_Th2_GATA3_Ab.bed.gz
>> >GSM773480_Th2_control_Ab.bed.gz
>> >GSM773484_Th2_WCE.bed.gz     (The 2 bed files listed above are the
>> >controls.)
>> >
>> >GSM773486_Th2_WT_anti_H3K27me3.bed.gz
>> >GSM773490_Th2_WT_anti_H3K9Ac.bed.gz
>> >GSM773492_Th2_WT_anti_H3K4me3.bed.gz
>> >GSM773488_Th2_WT_input.bed.gz (The 3 bed files listed above are the
>> >controls.)
>> >
>> >
>> ># macs14 1.4.2 20120305 peak calling output files.
>> >GATA3_Ab_peaks.bed
>> >control_Ab_peaks.bed
>> >
>> >H3K27me3_peaks.bed
>> >H3K4me3_peaks.bed
>> >H3K9Ac_peaks.bed
>> >
>> >
>> ># DiffBind sampleSheet file.
>> >%cat th2diffbind.csv
>> >SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,
>bamControl,
>> >ControlID,Peaks,PeakCaller,PeakFormat
>> >GATA3_Ab,GATA3_Ab,Th2,Resistant,Full_Media,1,databed/Th2_GATA3_Ab.bed.
>gz,
>> >databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/GATA3_Ab_peaks.bed,macs,
>raw
>> >control_Ab,control_Ab,Th2,Resistant,Full_Media,1,databed/Th2_control_
>Ab.
>> >bed.gz,databed/Th2_WCE.bed.gz,Th2_WCE_Control,peaks/control_Ab_peaks.
>bed,
>> >macs,raw
>> >H3K27me3,H3K27me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
>> >H3K27me3.bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/
>> >H3K27me3_peaks.bed,macs,raw
>> >H3K4me3,H3K4me3,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_
>H3K4me3.
>> >bed.gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K4me3_peaks.
>> >bed,macs,raw
>> >H3K9Ac,H3K9Ac,Th2,Responsive,Full_Media,1,databed/Th2_WT_anti_H3K9Ac.
>bed.
>> >gz,databed/Th2_WT_input.bed.gz,Th2_WT_Control,peaks/H3K9Ac_peaks.bed,
>> >macs,raw
>> >
>> >
>> >
>> >
>> >> th2 = dba(sampleSheet="th2diffbind.csv")
>> >GATA3_Ab GATA3_Ab Th2 Resistant Full_Media 1 macs
>> >control_Ab control_Ab Th2 Resistant Full_Media 1 macs
>> >H3K27me3 H3K27me3 Th2 Responsive Full_Media 1 macs
>> >H3K4me3 H3K4me3 Th2 Responsive Full_Media 1 macs
>> >H3K9Ac H3K9Ac Th2 Responsive Full_Media 1 macs
>> >> 
>> >> #th2
>> >> #str(th2)
>> >> #plot(th2)
>> >> 
>> >> # peaks counting reads
>> >> #th2 = dba.count(th2, bParallel=F)
>> >> th2 = dba.count(th2,minOverlap=3, bParallel=F)
>> >Sample: databed/Th2_GATA3_Ab.bed.gz
>> >
>> > *** caught segfault ***
>> >address 0x10, cause 'memory not mapped'
>> >
>> >Traceback:
>> > 1: .Call("croi_load_reads", as.character(bamfile), as.integer(
>> >insertLength))
>> > 2: pv.getCounts(job, bed, insertLength, bWithoutDupes =
>bWithoutDupes)
>> > 3: pv.listadd(results, pv.getCounts(job, bed, insertLength,
>> >bWithoutDupes = bWithoutDupes))
>> > 4: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap,
>defaultScore
>> >= score,     bLog = bLog, insertLength = insertLength, bOnlyCounts =
>T,
>> >   bCalledMasks = bCalledMasks, minMaxval = maxFilter, bParallel =
>> >bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates,
>> >bScaleControl = bScaleControl)
>> > 5: dba.count(th2, minOverlap = 3, bParallel = F)
>> >
>> >Possible actions:
>> >1: abort (with core dump, if enabled)
>> >2: normal R exit
>> >3: exit R without saving workspace
>> >4: exit R saving workspace
>> >Selection: 1
>> >
>> >
>> >
>> >> sessionInfo()
>> >R version 2.15.2 (2012-10-26)
>> >Platform: x86_64-suse-linux-gnu (64-bit)
>> >
>> >locale:
>> > [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C
>> > [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8
>> > [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=ja_JP.UTF-8
>> > [7] LC_PAPER=C                 LC_NAME=C
>> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> >[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
>> >
>> >attached base packages:
>> >[1] stats     graphics  grDevices utils     datasets  methods   base
>> >
>> >other attached packages:
>> >[1] DiffBind_1.4.2       Biobase_2.18.0       GenomicRanges_1.10.7
>> >[4] IRanges_1.16.6       BiocGenerics_0.4.0
>> >
>> >loaded via a namespace (and not attached):
>> > [1] RColorBrewer_1.0-5 amap_0.8-7         edgeR_3.0.8        gdata_2.
>12.
>> >0      
>> > [5] gplots_2.11.0      gtools_2.7.0       limma_3.14.4
>parallel_2.
>> >15.2   
>> > [9] stats4_2.15.2      zlibbioc_1.4.0
>> >> 
>> >---------------------------------------------------------------------
>---
>> >---------
>> >
>> >--------------------------------------
>> >Ken Tanaka
>> >MD-PhD Candidate
>> >Chiba University Medical School
>> >
>> >
>> >
>> >------------------------------
>> >
>> >_______________________________________________
>> >Bioconductor mailing list
>> >Bioconductor at r-project.org
>> >https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >
>> >
>> >End of Bioconductor Digest, Vol 123, Issue 26
>> >*********************************************
>> 
>> 
>
>



More information about the Bioconductor mailing list