[R] which parallel routine

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Jan 7 22:23:10 CET 2017


There is always overhead in starting and stopping parallel processes, but the "per subject terms ... slow and complex" suggests to me that this is already a small price.

Mcapply tends to be good when you need to share a lot of the same data with all processes and have many processors with shared memory. Snow tends to be good when you really need lots of processors or lots of temporary memory and the common inputs are relatively small.

If you think the overhead is still hurting you, break your subjects into bigger groups to process in each pass. Try to avoid passing data into each process that you don't use to help reduce overhead. If you need more cores or processors and you don't have them, parallelizing won't help. Expecting the default parallel code in an algorithm-oriented library function to make these choices to fit your specific constraints is unreasonable. 
-- 
Sent from my phone. Please excuse my brevity.

On January 7, 2017 12:15:57 PM PST, "Therneau, Terry M., Ph.D." <therneau at mayo.edu> wrote:
>I'm looking for advice on which of the parallel systems to use.
>
>Context: maximize a likelihood, each evaluation is a sum over a large
>number of subjects (>5000) and each of those per subject terms is slow
>and complex.
>
>If I were using optim the context would be
>  fit <- optim(initial.values, myfun, �.)
>  myfun <- function(params) {
>       � Do some initial setup�
>       temp <- apply-in-parallel(id,  per-subject-eval-fun, p=params)
>       unlist(temp)
>}
>
>The use of mcapply seems like there would be a lot of overhead starting
>and stopping threads.   But none of the tutorials I've found addresses
>this particular question.  Both direct answers and pointers to other
>docs would be welcome.
>
>Terry T.
>
>
>
>	[[alternative HTML version deleted]]



More information about the R-help mailing list