[R] Appropriateness of R functions for multicore

Hopkins, Bill Bill.Hopkins at Level3.com
Tue Aug 20 04:13:26 CEST 2013


I wrap functions to run via multicore with tryCatch() to gather stats on failure rate and capture state.



I'm still interested in how/whether core fuctions were verified as being threadsafe.



Bill Hopkins



Written using a virtual Android keyboard...

------ Original message ------
From: Jeff Newmiller
Date: 8/19/2013 5:18 PM
To: Patrick Connolly;
Cc: Hopkins, Bill;r-help at R-project.org;
Subject:Re: [R] Appropriateness of R functions for multicore

I don't know... I suppose it depends how it fails. I recommend that you restrict yourself to using only the data that was passed as parameters to your parallel function. You may be able to tackle parts of the task and return only those partial results to confirm how far through the code you can get.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

Patrick Connolly <p_connolly at slingshot.co.nz> wrote:
>On Sat, 17-Aug-2013 at 05:09PM -0700, Jeff Newmiller wrote:
>
>
>|> In most threaded multitasking environments it is not safe to
>|> perform IO in multiple threads. In general you will have difficulty
>|> performing IO in parallel processing so it is best to let the
>|> master hand out data to worker tasks and gather results from them
>|> for storage. Keep in mind that just because you have eight cores
>|> for processing doesn't mean you have eight hard disks, so if your
>|> problem is IO bound in single processor operation then it will also
>|> be IO bound in threaded operation.
>
>For tasks which don't involve I/O but fail with mclapply, how does one
>work out where the problem is?  The handy browser() function which
>allows for interactive diagnosis won't work with parallel jobs.
>
>What other approaches can one use?
>
>Thanx
>
>
>
>
>---------------------------------------------------------------------------
>
>
>
>|> Jeff Newmiller                        The     .....       .....  Go
>Live...
>|> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.
>Live Go...
>|>                                       Live:   OO#.. Dead: OO#..
>Playing
>|> Research Engineer (Solar/Batteries            O.O#.       #.O#.
>with
>|> /Software/Embedded Controllers)               .OO#.       .OO#.
>rocks...1k
>|>
>---------------------------------------------------------------------------
>
>|> Sent from my phone. Please excuse my brevity.
>|>
>|> "Hopkins, Bill" <Bill.Hopkins at Level3.com> wrote:
>|> >Has there been any systematic evaluation of which core R functions
>are
>|> >safe for use with multicore? Of current interest, I have tried
>calling
>|> >read.table() via mclapply() to more quickly read in hundreds of raw
>|> >data files (I have a 24 core system with 72 GB running Ubuntu, a
>|> >perfect platform for multicore). There was a 40% failure rate,
>which
>|> >doesn't occur when I invoke read.table() serially from within a
>single
>|> >thread. Another example was using pvec() to invoke
>|> >sapply(strsplit(),...) on a huge character vector (to pull out
>fields
>|> >from within a field). It looked like a perfect application for
>pvec(),
>|> >but it fails when serial execution works.
>|> >
>|> >I thought I'd ask before taking on the task of digging into the
>|> >underlying code to see what is might be causing failure in a
>multicore
>|> >(well, multi-threaded) context.
>|> >
>|> >As an alternative, I could define multiple cluster nodes locally,
>but
>|> >that shifts the tradeoff a bit in whether parallel execution is
>|> >advantageous - the overhead is significantly more, and even with 72
>GB,
>|> >it does impose greater limits on how many cores can be used.
>|> >
>|> >Bill Hopkins
>|> >
>|> >______________________________________________
>|> >R-help at r-project.org mailing list
>|> >https://stat.ethz.ch/mailman/listinfo/r-help
>|> >PLEASE do read the posting guide
>|> >http://www.R-project.org/posting-guide.html
>|> >and provide commented, minimal, self-contained, reproducible code.
>|>
>|> ______________________________________________
>|> R-help at r-project.org mailing list
>|> https://stat.ethz.ch/mailman/listinfo/r-help
>|> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>|> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list