[Rd] parallel PSOCK connection latency is greater on Linux?

Jeff je|| @end|ng |rom vtke||er@@com
Sun Nov 1 15:39:55 CET 2020


I'm exploring latency overhead of parallel PSOCK workers and noticed 
that serializing/unserializing data back to the main R session is 
significantly slower on Linux than it is on Windows/MacOS with similar 
hardware. Is there a reason for this difference and is there a way to 
avoid the apparent additional Linux overhead?

I attempted to isolate the behavior with a test that simply returns an 
existing object from the worker back to the main R session.

library(parallel)
library(microbenchmark)
gcinfo(TRUE)
cl <- makeCluster(1)
(x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
plot(x$time, ylab = "microseconds")
head(x$time, n = 10)

On Windows/MacOS, the test runs in 300-500 microseconds depending on 
hardware. A few of the 1000 runs are an order of magnitude slower but 
this can probably be attributed to garbage collection on the worker.

On Linux, the first 5 or so executions run at comparable speeds but all 
subsequent executions are two orders of magnitude slower (~40 
milliseconds).

I see this behavior across various platforms and hardware combinations:

Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
Linux Mint 19.3 (AMD Ryzen 7 1800X)
Linux Mint 20 (AMD Ryzen 7 3700X)
Windows 10 (AMD Ryzen 7 4800H)
MacOS 10.15.7 (Intel Core i7-8850H)



More information about the R-devel mailing list