[Rd] parallel PSOCK connection latency is greater on Linux?

Simon Urbanek @|mon@urb@nek @end|ng |rom r-project@org
Wed Nov 4 02:06:37 CET 2020


I'm not sure the user would know ;). This is very system-specific issue just because the Linux network stack behaves so differently from other OSes (for purely historical reasons). That makes it hard to abstract as a "feature" for the R sockets that are supposed to be platform-independent. At least TCP_NODELAY is actually part of POSIX so it is on better footing, and disabling delayed ACK is practically only useful to work around the other side having Nagle on, so I would expect it to be rarely used.

This is essentially RFC since we don't have a mechanism for socket options (well, almost, there is timeout and blocking already...) and I don't think we want to expose low-level details so perhaps one idea would be to add something like delay=NA to socketConnection() in order to not touch (NA), enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other way we could infer the intention of the user to try to choose the right approach...

Cheers,
Simon


> On Nov 3, 2020, at 02:28, Jeff <jeff using vtkellers.com> wrote:
> 
> Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they might determine what is best for their potentially latency- or throughput-sensitive application?
> 
> Best,
> Jeff
> 
> On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar <iucar using fedoraproject.org> wrote:
>> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek <simon.urbanek using r-project.org> wrote:
>>> It looks like R sockets on Linux could do with TCP_NODELAY -- without (status quo):
>> How many network packets are generated with and without it? If there
>> are many small writes and thus setting TCP_NODELAY causes many small
>> packets to be sent, it might make more sense to set TCP_QUICKACK
>> instead.
>> Iñaki
>>> Unit: microseconds
>>>                    expr      min       lq     mean  median       uq      max
>>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 48027.83
>>>  neval
>>>   1000
>>> exactly the same machine + R but with TCP_NODELAY enabled in R_SockConnect():
>>> Unit: microseconds
>>>                    expr     min     lq     mean  median      uq      max neval
>>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  1000
>>> Cheers,
>>> Simon
>>> > On 2/11/2020, at 3:39 AM, Jeff <jeff using vtkellers.com> wrote:
>>> >
>>> > I'm exploring latency overhead of parallel PSOCK workers and noticed that serializing/unserializing data back to the main R session is significantly slower on Linux than it is on Windows/MacOS with similar hardware. Is there a reason for this difference and is there a way to avoid the apparent additional Linux overhead?
>>> >
>>> > I attempted to isolate the behavior with a test that simply returns an existing object from the worker back to the main R session.
>>> >
>>> > library(parallel)
>>> > library(microbenchmark)
>>> > gcinfo(TRUE)
>>> > cl <- makeCluster(1)
>>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
>>> > plot(x$time, ylab = "microseconds")
>>> > head(x$time, n = 10)
>>> >
>>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on hardware. A few of the 1000 runs are an order of magnitude slower but this can probably be attributed to garbage collection on the worker.
>>> >
>>> > On Linux, the first 5 or so executions run at comparable speeds but all subsequent executions are two orders of magnitude slower (~40 milliseconds).
>>> >
>>> > I see this behavior across various platforms and hardware combinations:
>>> >
>>> > Ubuntu 18.04 (Intel Xeon Platinum 8259CL)
>>> > Linux Mint 19.3 (AMD Ryzen 7 1800X)
>>> > Linux Mint 20 (AMD Ryzen 7 3700X)
>>> > Windows 10 (AMD Ryzen 7 4800H)
>>> > MacOS 10.15.7 (Intel Core i7-8850H)
>>> >
>>> > ______________________________________________
>>> > R-devel using r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> --
>> Iñaki Úcar
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list