[R] Appropriateness of R functions for multicore

Daniel Nordlund djnordlund at frontier.com
Tue Aug 20 01:53:43 CEST 2013


The R high performance computing sig might be useful for some of these questions.
 
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc


Dan

Daniel Nordlund
Bothell, WA USA
 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Jeff Newmiller
> Sent: Monday, August 19, 2013 4:19 PM
> To: Patrick Connolly
> Cc: r-help at R-project.org; Hopkins,Bill
> Subject: Re: [R] Appropriateness of R functions for multicore
> 
> I don't know... I suppose it depends how it fails. I recommend that you
> restrict yourself to using only the data that was passed as parameters to
> your parallel function. You may be able to tackle parts of the task and
> return only those partial results to confirm how far through the code you
> can get.
> --------------------------------------------------------------------------
> -
> Jeff Newmiller                        The     .....       .....  Go
> Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.
> rocks...1k
> --------------------------------------------------------------------------
> -
> Sent from my phone. Please excuse my brevity.
> 
> Patrick Connolly <p_connolly at slingshot.co.nz> wrote:
> >On Sat, 17-Aug-2013 at 05:09PM -0700, Jeff Newmiller wrote:
> >
> >
> >|> In most threaded multitasking environments it is not safe to
> >|> perform IO in multiple threads. In general you will have difficulty
> >|> performing IO in parallel processing so it is best to let the
> >|> master hand out data to worker tasks and gather results from them
> >|> for storage. Keep in mind that just because you have eight cores
> >|> for processing doesn't mean you have eight hard disks, so if your
> >|> problem is IO bound in single processor operation then it will also
> >|> be IO bound in threaded operation.
> >
> >For tasks which don't involve I/O but fail with mclapply, how does one
> >work out where the problem is?  The handy browser() function which
> >allows for interactive diagnosis won't work with parallel jobs.
> >
> >What other approaches can one use?
> >
> >Thanx
> >
> >
> >
> >
> >-------------------------------------------------------------------------
> --
> >
> >
> >
> >|> Jeff Newmiller                        The     .....       .....  Go
> >Live...
> >|> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
> >Live Go...
> >|>                                       Live:   OO#.. Dead: OO#..
> >Playing
> >|> Research Engineer (Solar/Batteries            O.O#.       #.O#.
> >with
> >|> /Software/Embedded Controllers)               .OO#.       .OO#.
> >rocks...1k
> >|>
> >-------------------------------------------------------------------------
> --
> >
> >|> Sent from my phone. Please excuse my brevity.
> >|>
> >|> "Hopkins, Bill" <Bill.Hopkins at Level3.com> wrote:
> >|> >Has there been any systematic evaluation of which core R functions
> >are
> >|> >safe for use with multicore? Of current interest, I have tried
> >calling
> >|> >read.table() via mclapply() to more quickly read in hundreds of raw
> >|> >data files (I have a 24 core system with 72 GB running Ubuntu, a
> >|> >perfect platform for multicore). There was a 40% failure rate,
> >which
> >|> >doesn't occur when I invoke read.table() serially from within a
> >single
> >|> >thread. Another example was using pvec() to invoke
> >|> >sapply(strsplit(),...) on a huge character vector (to pull out
> >fields
> >|> >from within a field). It looked like a perfect application for
> >pvec(),
> >|> >but it fails when serial execution works.
> >|> >
> >|> >I thought I'd ask before taking on the task of digging into the
> >|> >underlying code to see what is might be causing failure in a
> >multicore
> >|> >(well, multi-threaded) context.
> >|> >
> >|> >As an alternative, I could define multiple cluster nodes locally,
> >but
> >|> >that shifts the tradeoff a bit in whether parallel execution is
> >|> >advantageous - the overhead is significantly more, and even with 72
> >GB,
> >|> >it does impose greater limits on how many cores can be used.
> >|> >
> >|> >Bill Hopkins
> >|> >
> >|> >______________________________________________
> >|> >R-help at r-project.org mailing list
> >|> >https://stat.ethz.ch/mailman/listinfo/r-help
> >|> >PLEASE do read the posting guide
> >|> >http://www.R-project.org/posting-guide.html
> >|> >and provide commented, minimal, self-contained, reproducible code.
> >|>
> >|> ______________________________________________
> >|> R-help at r-project.org mailing list
> >|> https://stat.ethz.ch/mailman/listinfo/r-help
> >|> PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >|> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list