[R] Rsge: recursive parallelization

Martin Morgan mtmorgan at fhcrc.org
Sun Apr 11 00:02:32 CEST 2010


On 04/09/2010 08:52 AM, Peter Danenberg wrote:
> In principle, I'd like to be able to do something like this:
> 
>   sge.parLapply(seq(10), function(x) parLapply(seq(x), function(x) x^2))

I'm not sure that's such a good principle! It seems like it would be
hard to think about the tasks that are being executed, how many
processes there are, how load balancing works, etc. What about starting
with some complicated data structure that requires processing

  work <- lapply(seq(10), function(x) as.list(seq(x)))

Then making a flat list of tasks that need to be done

  idx0 <- rep(seq_along(work), sapply(work, length))
  idx1 <- unlist(lapply(work, seq_along))
  tasks <- mapply(c, idx0, idx1, SIMPLIFY=oFALSE)

and actually do the work in an easily parallelizable lapply

  answers <- lapply(tasks, function(t, w) w[[ t ]]^2, work)

(the idea here is that work[[ c(3, 2) ]] selects the third element of
the outer list, and then the second element of that element). You could
transform the result back into the original form with

  result <- work
  for (t in seq_along(tasks))
      result[[ tasks[[t]] ]] <- answers[[t]]

Martin

> 
> In practice, however, I have to resort to acrobatics like this:
> 
>   sge.options(sge.remove.files=FALSE)
>   sge.options(sge.qsub.options='-cwd -V')
>   sge.parLapply(seq(10),
>                 function(x) {
>                   sge.options(sge.save.global=TRUE)
>                   sge.options(sge.remove.files=FALSE)
>                   sge.parLapply(seq(x),
>                                 function(x) x^2,
>                                 cluster=TRUE,
>                                 debug=FALSE,
>                                 trace=FALSE,
>                                 file.prefix='Rsge_data',
>                                 global.savelist=NULL,
>                                 packages=NULL)
>                 },
>                 function.savelist=c('sge.parLapply', 'sge.parParApply',
>                   'sge.options', 'sge.taskPrep'),
>                 global.savelist=c('sge.parParApply', 'sge.globalPrep',
>                   'global.savelist', 'sge.taskPrep', 'sge.checkNotNow',
>                   'sge.get.jobid', 'sge.get.result', 'docall',
>                   'enquote'),
>                 packages=NULL)
> 
> and I still get bizarre behavior: half of the results will be NULL,
> for instance; the other half, incomplete.
> 
> Would non-trivial changes to Rsge be required to make something like
> this possible?
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list