[BioC] DiffBind error loading dba.count

Gordon Brown Gordon.Brown at cruk.cam.ac.uk
Tue Feb 5 20:38:39 CET 2013


Hi, folks,

Around 100M reads should only take about 2Gb (on 64-bit hardware), and it
should scale slightly sub-linearly with the number of reads, so either
you've got a *whole* lot of reads, or something's wrong.  Aside from
Rory's questions, can you pass along more details about the hardware
you're running on (e.g. processor architecture)?  Also, do you mind
sharing the species of your experiment?  If the reference genome is in a
very large number of contigs, as opposed to a few 10's of chromosomes, I
could imagine ways things could go wrong.

If you really do have that many reads, you could try down-sampling the
data first.  If it's more or less routine ChIP data, you probably only
need 20-30 million reads per sample to get a reasonable signal, and run an
analysis.

Cheers,

 - Gord


On 2013-02-05 12:02, "Rory Stark" <Rory.Stark at cruk.cam.ac.uk> wrote:

>Hello Doron-
>
>
>Yes, the memory usage when calling dba.count is definitely an issue ‹ one
>we are planning on addressing in the next version. I'll let you know what
>that is available.
>
>
>I see you are running dba.count with bParallel=FALSE, so you should only
>be reading in one file at a time. How large (in Gb, or how many reads) is
>your largest bam file? I've never seen dba.count use this much memory!
>Let us know the sizes so we can see
> if it is something we should be debugging. Please also sent the output
>of sessionInfo.
>
>
>Besides changing dba.count to not use so much memory, we are also
>implementing an option to read the counts in directly as you have
>suggested. I am hoping to check this option in fairly soon (I already
>have a version of it running and use it regularly
> for RNA-seq data).
>
>
>Regards-
>Rory  
>
>
>From: Doron Betel <dob2014 at med.cornell.edu>
>Organization: WCMC
>Date: Fri, 1 Feb 2013 18:05:02 -0500
>To: Rory Stark <rory.stark at cancer.org.uk>
>Subject: Re: [BioC] DiffBind error loading dba.count
>Resent-From: Rory Stark <rory.stark at cancer.org.uk>
>
>
>
>Hi Rory,
>I came across this threads in the mailing list when looking for a
>solution to a similar problem.
>
>I have 12 ChiP-seq samples with the associated chip and control bam files.
>When I run the following call:
>fivehmc.peaks <- dba.count(fivehmc.peaks, minOverlap=2, bParallel=FALSE,
>bCorPlot=FALSE,maxFilter=10)
>
>The R session is killed by the linux OS after consuming a huge amount of
>memory (in my last check it was ~40g-50g).
>I have a 100G RAM linux server which should be more than enough to read
>in this data.
>
>I tired different options and poking a bit at the source code but i can't
>find a solution to this.
>
>I can easily generate the count matrix for the peaks myself (for both
>chip and  control) but i don't know if, and how, it is possible to add it
>to the DBA object without calling dba.count and what would be the data
>structure it requires. I really like the package
> and it could potentially be very useful to me but this large memory
>consumption is limiting its use.
>
>Any ideas how i can work around this problem?
>
>Thanks for your help,
>
>doron   
>-- 
>Doron Betel Ph.D.
>Assistant Professor of Computational Biomedicine
>Department of Medicine &
>Institute for Computational Biomedicine
>Weill Cornell Medical College
>
>NOTICE AND DISCLAIMER
>This e-mail (including any attachments) is intended for the above-named
>person(s). If you are not the intended recipient, notify the sender
>immediately, delete this email from your system and do not disclose or
>use for any purpose.
>
>
>We may monitor all incoming and outgoing emails in line with current
>legislation. We have taken steps to ensure that this email and
>attachments are free from any virus, but it remains your responsibility
>to ensure that viruses do not adversely affect you.
>
>Cancer Research UK
>Registered charity in England and Wales (1089464), Scotland (SC041666)
>and the Isle of Man (1103)
>A company limited by guarantee. Registered company in England and Wales
>(4325234) and the Isle of Man (5713F).
>Registered Office Address: Angel Building, 407 St John Street, London
>EC1V 4AD.
>
>



More information about the Bioconductor mailing list