[Rd] rmpi vs snow - which one is better from communication overhead point of view

Stephen Weston stephen.b.weston at gmail.com
Wed Jan 4 14:55:14 CET 2012


On Wed, Jan 4, 2012 at 4:57 AM, rdxcheena <rdxcheena at gmail.com> wrote:
> Hi,
>
> I need to understand when is it best to use /rmpi/ and when is it best to
> use /snow/ for parallel programming in R? I understand snow can be used for
> a group of non-clustered work stations also. But I wish to understand from
> the point of view of using both on clusters for a problem which has few
> chunks of straightforward data-parallelism interleaved with some
> communication. Since both are based on /mpi/, which one provides better
> performance for same kind of communication? Can I do explicit send, receive,

Snow uses MPI via the Rmpi package, so you can always write equivalent
code in Rmpi that is at least as fast as snow.  You might want to read the
paper "State of the Art in Parallel Computing with R" by Markus Schmidberger,
Martin Morgan, Dirk Eddelbuettel, Hao Yu, Luke Tierney, and Ulrich
Mansmann for a more information on that subject.

> broadcast, etc with snow?

Snow doesn't provide any explicit communication operations, unless you
count clusterExport.

> Also, if I use /foreach/ on either of these, does this add further overhead?

Yes, foreach will definitely add overhead, and it doesn't give you access to
explicit communication either.

> Please help me understand the difference in the provisions of the two and
> select one of them for my current and future projects.

If you're primarily interested in performance, you should almost certainly
pick Rmpi.  And if you want to perform explicit MPI communication, such as
broadcasting, it's your only choice as far as I know.

- Steve



More information about the R-devel mailing list