[BioC] beadarray: Running BASH for 120 sections

Gavin Koh gavin.koh at gmail.com
Wed Apr 6 14:13:41 CEST 2011


Thanks. Is it possible to setWeights() for parts of my beadData object
instead? That would involve a lot less rewriting... G.

On 6 April 2011 12:30, Mike Smith <grimbough at gmail.com> wrote:
> Hi Gavin,
>
> I'm sending this to the BioC list as well so there's a record for anyone
> else looking for advice.
>
> The code you've suggested isn't quite what I had in mind.  In this case, I
> was suggesting the combine() function should be used on the beadData objects
> after the weights have been set.  Hopefully the pseudo code below will be
> understandable (anything in quotes needs to be specified for your data).
>
> So on our cluster I would run multiple R sessions.  The I would read a
> subset of the arrays in each session so:
>
> For sesssion 1:
> beadData1 <- readIllumina("sections 1 : N")
> For session 2:
> beadData2 <- readIllumina("sections N+1 : 2N")
> etc...
>
> I find sticking to a BeadChip per R session is convenient, mostly due to the
> folder structure produced by BeadScan.
>
> Then in each R session you can run something like the following:
>
> ..."Any pre-BASH processing you like"....
> BASHoutput <- BASH(beadData, array = 1:N)
> beadData <- setWeights(beadData, bashOutput$wts, array=1:N)
> save(beadData, file = "beadData session N")
> quit...
>
> I'd then open a new R session and load the various beadData objects.
> You can then combine them with:
> beadData <- combine(beadData1, beadData2).
> If you have more than two you'll probably need a loop, I don't think our
> combine function takes more than two at a time, but it's worth checking the
> man page for that.
>
> You should then have one large beadData object with all the arrays and BASH
> weights.  As an alternative, you could skip the combining step, don't close
> the separate R session  and do any further processing right up to the
> summarization step.  I think I'm right in saying none of the QC, QA etc
> requires information between chips, so each can be process independently.
>
> That's probably all a bit messy, but feel free to ask any more questions.
>
> Mike
>
>
> On Tue, Apr 5, 2011 at 11:25 PM, Gavin Koh <gavin.koh at gmail.com> wrote:
>>
>> Dear Mike,
>> Thanks for replying so quickly.
>> R exits and throws me back to the system prompt.
>>
>> I'll try running array 53 alone first to see if that is the problem.
>> If that is not the problem, then I would like to try breaking it up
>> into batches as you suggest.
>> I've not used the bioBase combine() function before, but looking at
>> the help file, I would think that I could do
>>
>> bashOutput1 <- BASH(beadData, array=1:12)
>> bashOutput2 <- BASH(beadData, array=13:24)
>> .
>> .
>> .
>> bashOutput <- combine(bashOutput1, bashOutput2,...bashOutputn)
>> beadData <- setWeights(beadData, bashOutput$wts, array=1:n)
>>
>> Am I right?
>>
>> Thanks,
>>
>> Gavin.
>>
>> On 5 April 2011 13:30, Mike Smith <grimbough at gmail.com> wrote:
>> > Hi Gavin,
>> >
>> > I'm afraid that particular error means nothing to me.  Does R exit too,
>> > or
>> > does BASH stop and return you to an interactive session?
>> >
>> > I found this very old post on R exit codes
>> > (http://tolstoy.newcastle.edu.au/R/help/02b/3168.html), which may be
>> > relevant but I'm speculating at the moment.
>> >
>> > Is there anything particularly unusual with the 53rd array?  If you try
>> > to
>> > BASH that array in isolation e.g. BASHoutput <- BASH(beadData, array=53)
>> > does it proceed ok?
>> >
>> > If it is a memory problem then it may be worth waiting for the next
>> > Bioconductor release in about a week.  I recently discovered a memory
>> > leak
>> > and a small bug that could cause a segfault in the BASH C code, which
>> > I've
>> > patched in the developmental version.  I conducted a test this morning
>> > with
>> > 4 HumanV3 sections and the memory leak was about 100MB, which isn't
>> > ideal,
>> > but with a 16GB limit I'd have thought you'd have enough head room not
>> > to be
>> > affected by it.
>> >
>> > Personally I've never tried to BASH so many sections in one go, but
>> > there's
>> > reason it shouldn't work (memory and time permitting).  What we tend to
>> > do
>> > is read a smaller number of sections (say a single BeadChip) into an R
>> > session and process each in separately.  We then save each separate
>> > object
>> > once it's been processed, load them all into a new R session and use the
>> > combine() function to create a single beadLevelData object.  That way it
>> > can
>> > be done in sort of coarse grained parallel.
>> >
>> > As far as making it parallel in a more friendly way, that's something
>> > we're
>> > working on, but it's not an imminent release.
>> >
>> > I hope that's some help,
>> >
>> >
>> > On Mon, Apr 4, 2011 at 9:51 PM, Gavin Koh <gavin.koh at gmail.com> wrote:
>> >>
>> >> I have 60 samples which were run on an Illumina HumanWG-6 v3.0
>> >> Expression BeadChip (so 120 sections) and I am doing the
>> >> pre-processing using beadarray.
>> >>
>> >> I am trying to generate spatial masks using BASH(). I have
>> >> successfully run a smaller analysis (one slide of 12 sections) on my
>> >> MacBook OSX Snow Leopard with 4Gb RAM using beadarray 2.7.
>> >>
>> >> The command I used to call BASH was:
>> >> BASHoutput <- BASH(beadData, array=1:n)
>> >>
>> >> I am running the full analysis (120 sections) on a computing cluster
>> >> (lustre). I have only requested a single core with 16Gb RAM, because I
>> >> don't know how to get BASH() to multithread (although in theory it
>> >> ought to be possible? it is a repetitive process after all). I cannot
>> >> get the script past 53 sections, without bash() terminating with exit
>> >> code "user code 2". Doesn't matter if I am running it interactively or
>> >> calling R CMD BATCH. I don't know what the exit code means, so I don't
>> >> know how to fix it. I don't think it is out of memory, because lustre
>> >> has other codes for reporting out-of-memory and R usually reports
>> >> out-of-memory errors as "cannot allocate vector of size..."? Also, the
>> >> previous time it ran out of memory (when I tried 6 Gb RAM), it was
>> >> lustre that terminated the process.
>> >>
>> >> I don't know if the problem is that BASH() cannot handle so many
>> >> sections. If that is in fact the problem, then there are two solutions
>> >> I can think of: 1. get BASH() to run multithreaded, or 2. run BASH()
>> >> on selected sections only.
>> >>
>> >> On inspection of the pseudoimages, I can see there are only two
>> >> sections of the 120 with obvious spatial defects (they look like
>> >> scratches). Is it possible to find outliers on the other sections
>> >> using the usual (faster) method (>3MAD) and then just use BASH() for
>> >> the two sections that are defective only? or...is there a tool to just
>> >> draw the masks myself??
>> >>
>> >> Thanks in advance,
>> >>
>> >> Gavin
>> >>
>> >> sessionInfo() reports:
>> >> R version 2.12.0 (2010-10-15)
>> >> Platform: x86_64-unknown-linux-gnu (64-bit)
>> >>
>> >> locale:
>> >>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>> >>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>> >>  [5] LC_MONETARY=C              LC_MESSAGES=C
>> >>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>> >>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>> >>
>> >> attached base packages:
>> >> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >>
>> >> other attached packages:
>> >> [1] beadarray_2.0.6 Biobase_2.10.0
>> >>
>> >> loaded via a namespace (and not attached):
>> >> [1] limma_3.6.6
>> >>
>> >> --
>> >> Hofstadter's Law: It always takes longer than you expect, even when
>> >> you take into account Hofstadter's Law.
>> >> —Douglas Hofstadter (in Gödel, Escher, Bach, 1979)
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at r-project.org
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>> > --
>> > Mike Smith
>> > PhD Student
>> > Computational Biology Group
>> > Cambridge University
>> >
>>
>>
>>
>> --
>> Hofstadter's Law: It always takes longer than you expect, even when
>> you take into account Hofstadter's Law.
>> —Douglas Hofstadter (in Gödel, Escher, Bach, 1979)
>
>
>
> --
> Mike Smith
> PhD Student
> Computational Biology Group
> Cambridge University
>



-- 
Hofstadter's Law: It always takes longer than you expect, even when
you take into account Hofstadter's Law.
—Douglas Hofstadter (in Gödel, Escher, Bach, 1979)



More information about the Bioconductor mailing list