[R] Problem with mclapply -- losing output/data

Rainer M Krug r.m.krug at gmail.com
Wed Mar 23 11:02:14 CET 2011


On Wed, Mar 23, 2011 at 10:42 AM, Patrick Connolly
<p_connolly at slingshot.co.nz> wrote:
> G'day Elizabeth,
>
> For what it's worth, this is what I'd do were I in a position
> like yours:
>
> I would put a condition near the end of myfunc. that responded
> when there was an indication that NULLs were to be returned into
> your main list.  I'd make an additional list with those bits
> which would also collect sufficient information to work out which
> values of x lead to that result.  Then you'll be able to see
> which ones give the problem.
>
> Try running mclapply on only those bits and see if they all
> respond the same way.  If they do not, something very strange is
> happening.  But if those still behave the same way, then run with
> only a single value of x in your call to mclapply.
>
> I find the browser() function to be almost indispensable when working
> out what's causing such problemss but to my knowledge, it won't work
> when multiple cores are running in parallel.  If you use a single
> value of x, you can go back to using that trusted method.  You might
> also have to set nc.cores to 1, but I don't think so.

Yes - I would try to set nc.cores to 1 and then try it again - then
you can see if it is mcapply, the parallel bit ? I had problems with
one analysis where the analysis created temporry files of the same
name - therefore SEVERE interference when running in parallel - might
that be the case? Is the

In addition: is your simulation deterministic or stochastic? Do the
NULLs always occur for the same indices? What happens if you use the
subset of your input vector, which includes the ones where you get
NULLs?
If your simulation calls many other functions, these might not be thread safe?

So many things to test.

But I do not think that it is a problem in multicore as it is used quite widely.

Cheers,

Rainer

>
> HTH
>
>
>
>
> On Tue, 22-Mar-2011 at 01:13AM -0700, Elizabeth Purdom wrote:
>
>> Hello,
>> I am running large simulations, which unfortunately I can't really
>> replicate here because the code is so extensive. I rely heavily on
>> mclapply, but I realize that I'm losing data somewhere.
>>
>> There are two worrisome symptoms:
>> 1) I am getting 'NULL' as a return value for some (but not all) elements
>> of the output when I use mclapply, but not if I use lapply
>> > tmp2[1:3] #output from lapply
>> [[1]]
>> 10000076 10000077
>>       24       24
>>
>> [[2]]
>> 10000076 10000077
>>      119      119
>>
>> [[3]]
>> 10000076
>>       71
>>
>> > tmp[1:3] #output from mclapply
>> [[1]]
>> NULL
>>
>> [[2]]
>> NULL
>>
>> [[3]]
>> NULL
>>
>>
>> 2) I am not getting back a list the same length as my input vector I'm
>> parallelizing over. i.e. a command like this:
>>
>> tmp<-mclapply(x, FUN=myfunc, mc.cores=16)
>>
>> gives me back a list tmp which is not the same length as x (and so I'm
>> getting all kinds of errors)
>>
>> This is extremely discouraging, because I've been using mclapply
>> extensively at very many points on simulations that take a very long
>> time to run, and now I'm wondering if what I'm getting is trustworthy. I
>> don't think I could reasonably finish my results without mclapply, but I
>> am thinking to cut it out except where it was absolutely necessary,
>> time-wise. If anyone had any suggestions as to why this might be
>> happening and how I can circumvent it (or test for it happening), I
>> would greatly appreciate it.
>>
>> Thanks,
>> Elizabeth Purdom
>>
>> > sessionInfo()
>> R version 2.12.1 (2010-12-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] multicore_0.1-4       msm_1.0               gtools_2.6.2
>> graph_1.28.0          Rsamtools_1.2.3
>> [6] Biostrings_2.18.2     GenomicFeatures_1.2.3 GenomicRanges_1.2.3
>> IRanges_1.8.9
>>
>> loaded via a namespace (and not attached):
>>  [1] Biobase_2.10.0     biomaRt_2.6.0      BSgenome_1.18.3    DBI_0.2-5
>>        mvtnorm_0.9-96     RCurl_1.5-0
>>  [7] RSQLite_0.9-4      rtracklayer_1.10.6 splines_2.12.1
>> survival_2.36-2    tools_2.12.1       XML_3.2-0
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>   ___    Patrick Connolly
>  {~._.~}                   Great minds discuss ideas
>  _( Y )_                 Average minds discuss events
> (:_~*~_:)                  Small minds discuss people
>  (_)-(_)                              ..... Eleanor Roosevelt
>
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
NEW GERMAN FAX NUMBER!!!

Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:           +27 - (0)83 9479 042
Fax:            +27 - (0)86 516 2782
Fax:            +49 - (0)321 2125 2244
email:          Rainer at krugs.de

Skype:          RMkrug
Google:         R.M.Krug at gmail.com



More information about the R-help mailing list