[BioC] DiffBind -error with dba.counts

Tue Sep 17 19:01:23 CEST 2013

Hi Gordon

Please see below the session info:

 > sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods   
base

other attached packages:
[1] DiffBind_1.6.2       Biobase_2.20.1       GenomicRanges_1.12.5 
IRanges_1.18.3       BiocGenerics_0.6.0   BiocInstaller_1.10.3

loaded via a namespace (and not attached):
  [1] amap_0.8-7         edgeR_3.2.4        gdata_2.13.2 
gplots_2.11.3      gtools_3.0.0       limma_3.16.7 RColorBrewer_1.0-5 
stats4_3.0.1
  [9] tools_3.0.1        zlibbioc_1.6.0

I have anywhere from 30-55 million reads for my samples. Yes, everything 
else on the machine does slow down quite a bit.

I am running R locally now as we do not have R 3.0.1 installed on 
command line. Not sure if that matters.

Thanks for all your help.

Anitha

On 9/17/13 3:05 AM, Gordon Brown wrote:
> Hi, Anitha,
>
> What version of Bioconductor/DiffBind are you running, and how much memory
> does your computer have?  Older versions of DiffBind use a *lot* of memory
> in the counting stage, so if your computer is short on RAM, it could
> easily run out of memory and start swapping to disk, which will slow it
> down by orders of magnitude.  Does everything else on the machine slow
> down as well?
>
> Can you pass along the output from the "sessionInfo()" command?
>
> And if possible, upgrade to the latest version of DiffBind (if you're not
> there already) and try the "bLowMem" option on dba.count.
>
> Other than that, I can't think of any reason it should take hours, unless
> you have *really* big data files.  How many reads are in them, roughly?
>
>   - Gord
>
>
> On 2013-09-16 21:21, "Anitha Sundararajan" <asundara at ncgr.org> wrote:
>
>> Sorry, I did try the minOverlap=2 (didnt rectify when I wrote the email,
>> my bad)
>>
>>
>> On 9/16/13 1:59 PM, Anitha Sundararajan wrote:
>>> Hi Gordon
>>>
>>> I am now trying to run both reps for each sample, despite their low
>>> correlation.  When I try the
>>>
>>>> B73.H3K4=dba.count(B73.H3K4, minOverlap=3)
>>> the R-session just freezes and there is no response for hours.  I am
>>> not sure if there is anything wrong with any of my input files.  The
>>> sample sheet gets read in fine without any errors.
>>>
>>> Just FYI, my bed file (form MACS2) looks like:
>>>
>>>
>>> chr1    9128    9552    MACS_peak_1     105.25
>>> chr1    9918    10127   MACS_peak_2     4.72
>>> chr1    79482   79691   MACS_peak_3     5.10
>>> chr1    86963   87514   MACS_peak_4     50.23
>>> chr1    94579   94781   MACS_peak_5     5.10
>>> chr1    103763  103997  MACS_peak_6     5.10
>>> chr1    110722  111047  MACS_peak_7     97.69
>>> chr1    144929  145568  MACS_peak_8     127.78
>>> chr1    161344  162320  MACS_peak_9     136.89
>>> chr1    222479  223058  MACS_peak_10    77.67
>>> chr1    227130  227628  MACS_peak_11    17.02
>>> chr1    263835  263971  MACS_peak_12    12.60
>>> chr1    264068  264518  MACS_peak_13    58.01
>>> chr1    264625  265056  MACS_peak_14    68.16
>>> chr1    270509  271086  MACS_peak_15    47.15
>>> chr1    277629  277789  MACS_peak_16    13.25
>>>
>>> Not sure if this is the problem?
>>>
>>> Thanks so much.
>>>
>>> Anitha
>>>
>>> On 9/16/13 3:51 AM, Gordon Brown wrote:
>>>> Hi, Anitha,
>>>>
>>>> The basic problem is that you have two samples, but you're asking for a
>>>> minOverlap of 3 (i.e. for peaks which occur in at least 3 samples).  No
>>>> locations can satisfy that criterion, so you end up with an empty set
>>>> of
>>>> peaks.
>>>>
>>>> The message is obscure, I will admit.  (It happens because DiffBind
>>>> writes
>>>> out the unified set of peaks and reads it back in, for tedious
>>>> implementation reasons, and when it reads it back in, there are no
>>>> peaks,
>>>> hence "no lines available in input".)
>>>>
>>>> Try using minOverlap=2.   But... having said that, I'm not sure how
>>>> useful
>>>> DiffBind will be to you, without replicates.
>>>>
>>>> Cheers,
>>>>
>>>>    - Gord Brown
>>>>
>>>>
>>>>
>>>>> Message: 22
>>>>> Date: Fri, 13 Sep 2013 12:21:02 -0600
>>>>> From: Anitha Sundararajan <asundara at ncgr.org>
>>>>> To: bioconductor at r-project.org
>>>>> Subject: [BioC] DiffBind -error with dba.counts
>>>>> Message-ID: <5233578E.3090701 at ncgr.org>
>>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>>
>>>>> Hi
>>>>>
>>>>> I have been trying to use DiffBind to analyze our Chip-seq data and
>>>>> have
>>>>> been running into some errors repeatedly.
>>>>>
>>>>> I first created a samplesheet.csv describing my samples and it looks
>>>>> like this:
>>>>>
>>>>>
>>>>> SampleID,Tissue,Factor,Condition,Replicate,bamReads,bamControl,Peaks,Pe
>>>>> akC
>>>>>
>>>>> aller
>>>>>
>>>>>
>>>>> meio.1,meiocytes,H3K4me3,N,1,M_meiocytes_H3K4me3.bam,InM_input_meiocyte
>>>>> s.b
>>>>>
>>>>> am,meio.vs.in.rep1.def_peaks.bed,MACS
>>>>>
>>>>>
>>>>> seed.1,seedlings,H3K4me3,N,1,S_seedling_H3K4me3.bam,InS_input_seedling.
>>>>> bam
>>>>>
>>>>> ,seed.vs.in.rep1.def_peaks.bed,MACS
>>>>>
>>>>>
>>>>> I only have two samples (and their respective inputs) with one rep
>>>>> each
>>>>> and the peaks were called using MACS v2. The peak caller generated
>>>>> .bed
>>>>> files which was used in DiffBind.
>>>>>
>>>>>
>>>>> I defined the working directory in R first.
>>>>>
>>>>> I then read the sample sheet in :
>>>>>> H3K4.B73=dba(sampleSheet='samplesheet2.csv',peakFormat='bed')
>>>>>> H3K4.B73
>>>>> 2 Samples, 38870 sites in matrix (45304 total):
>>>>>         ID    Tissue  Factor Condition Replicate Peak.caller Intervals
>>>>> 1 meio.1 meiocytes H3K4me3        N         1        MACS 44124
>>>>> 2 seed.1 seedlings H3K4me3         N         1        MACS 41596
>>>>>
>>>>> generated a plot,
>>>>>> plot(H3K4.B73)
>>>>> And then when I tried to perform dba.counts, it continuously fails on
>>>>> me.  I went through the thread to find similar posts and could not
>>>>> find
>>>>> a solution.  I tried the floowing command:
>>>>>
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3)
>>>>> and this,
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=TRUE)
>>>>>> H3K4.B73=dba.count(H3K4.B73, minOverlap=3, bLowMem=FALSE)
>>>>> And they all failed.
>>>>>
>>>>> My error in all three cases is as follows:
>>>>> Error in read.table(fn, skip = skipnum) : no lines available in input
>>>>>
>>>>> Please let me know if you have any insights on it.
>>>>>
>>>>> Thanks so much for your help in advance.
>>>>>
>>>>> Anitha Sundararajan Ph.D.
>>>>> Research Scientist
>>>>> National Center for Genome Resources
>>>>> Santa Fe, NM 87505