[R] Appropriateness of R functions for multicore

Hopkins, Bill Bill.Hopkins at Level3.com
Sat Aug 17 21:48:39 CEST 2013


Has there been any systematic evaluation of which core R functions are safe for use with multicore? Of current interest, I have tried calling read.table() via mclapply() to more quickly read in hundreds of raw data files (I have a 24 core system with 72 GB running Ubuntu, a perfect platform for multicore). There was a 40% failure rate, which doesn't occur when I invoke read.table() serially from within a single thread. Another example was using pvec() to invoke sapply(strsplit(),...) on a huge character vector (to pull out fields from within a field). It looked like a perfect application for pvec(), but it fails when serial execution works.

I thought I'd ask before taking on the task of digging into the underlying code to see what is might be causing failure in a multicore (well, multi-threaded) context.

As an alternative, I could define multiple cluster nodes locally, but that shifts the tradeoff a bit in whether parallel execution is advantageous - the overhead is significantly more, and even with 72 GB, it does impose greater limits on how many cores can be used.

Bill Hopkins



More information about the R-help mailing list