[R] project parallel help

Tue Oct 15 03:10:42 CEST 2013

Jeff:

Thank you for your response.  Please let me know how I can
"unhandicap" my question.  I tried my best to be concise.  Maybe this
will help:

> version
               _
platform       i386-w64-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          3
minor          0.2
year           2013
month          09
day            25
svn rev        63987
language       R
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing

I understand your comment about forking.  You are right that forking
is not available on windows.

What I am curious about is whether or not I can direct the execution
of the parallel package's functions to diminish the overhead.  My
guess is that there is overhead in copying the function to be executed
at each iteration and there is overhead in copying the data to be used
at each iteration.  Are there any paradigms in the package parallel to
reduce these overheads?  For instance, I could use clusterExport to
establish the function to be called.  But I don't know if there is a
technique whereby I could point to the data to be used by each CPU so
as to prevent a copy.

Jeff

On Mon, Oct 14, 2013 at 2:35 PM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:
> Your question misses on several points in the Posting Guide so any answers are handicapped by you.
>
> There is an overhead in using parallel processing, and the value of two cores is marginal at best. In general parallel by forking is more efficient than parallel by SNOW, but the former is not available on all operating systems. This is discussed in the vignette for the parallel package.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Jeffrey Flint <jeffrey.flint at gmail.com> wrote:
>>I'm running package parallel in R-3.0.2.
>>
>>Below are the execution times using system.time for when executing
>>serially versus in parallel (with 2 cores) using parRapply.
>>
>>
>>Serially:
>>   user  system elapsed
>>   4.67    0.03    4.71
>>
>>
>>
>>Using package parallel:
>>   user  system elapsed
>>   3.82    0.12    6.50
>>
>>
>>
>>There is evident improvement in the user cpu time, but a big jump in
>>the elapsed time.
>>
>>In my code, I am executing a function on a 1000 row matrix 100 times,
>>with the data different each time of course.
>>
>>The initial call to makeCluster cost 1.25 seconds in elapsed time.
>>I'm not concerned about the makeCluster time since that is a fixed
>>cost.  I am concerned about the additional 1.43 seconds in elapsed
>>time (6.50=1.43+1.25).
>>
>>I am wondering if there is a way to structure the code to avoid
>>largely avoid the 1.43 second overhead.  For instance, perhaps I could
>>upload the function to both cores manually in order to avoid the
>>function being uploaded at each of the 100 iterations?    Also, I am
>>wondering if there is a way to avoid any copying that is occurring at
>>each of the 100 iterations?
>>
>>
>>Thank you.
>>
>>Jeff Flint
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>