[R] socket problems (maybe bugs?)

Luke Tierney luke at stat.uiowa.edu
Sat Feb 19 23:49:07 CET 2005


On Sat, 19 Feb 2005, Luke Tierney wrote:

> On Thu, 17 Feb 2005, Christian Lederer wrote:
>
>> Dear R Gurus,
>> 
>> for some purpose i have to use a socket connection, where i have to read
>> and write both text and binary data (each binary data package will be
>> preceeded by a header line).
>> When experimenting, i encountered some problems (with R-2.0.1 under
>> different Linuxes (SuSE and Gentoo)).
>> 
>> Since the default mode for socket connections is non-blocking,
>> i first tried socketSelect() in order to see whether the socket is ready
>> for reading:
>> 
>> # Server:
>> s <- socketConnection(port=2222, server=TRUE, open="w+b")
>> writeLines("test", s)
>> writeBin(1:10, s, size=4, endian="big")
>> 
>> # Client, variation 1:
>> s <- socketConnection(port=2222, server=FALSE, open="w+b")
>> socketSelect(list(s))
>> readLines(s, n=1)     # works, "test" is read
>> socketSelect(list(s)) # does never return, although the server wrote 1:10
>> 
>> (This seems to happen only, when i mix text and binary reads.)
>> However, without socketSelect(), R may crash if i try to read from an
>> empty socket:
>> 
>> Server:
>> s <- socketConnection(port=2222, server=TRUE, open="w+b")
>> writeLines("test", s)
>> writeBin(1:10, s, size=4, endian="big")
>> 
>> # Client, variation 2:
>> s <- socketConnection(port=2222, server=FALSE, open="w+b")
>> readLines(s, n=1)                              # works, "test" is read
>> readBin(s, "int", size=4, n=10, endian="big")  # works, 1:10 is read
>> readBin(s, "int", size=4, n=10, endian="big")  # second read leads to
>>                                               # segmentation fault
>> 
>> If i omit the endian="big" option, the second read does not crash, but
>> just gets 10 random numbers.
>> 
>> On the first view, this does not seem to be a problem, since the
>> data will be preeceded by a header, which contains the number of
>> bytes in the binary block.
>> However, due to race conditions, i cannot exclude this situation:
>> 
>> time    server             client
>> t0      sends header
>> t1                         reads header
>> t2                         tries to read binary, crashes
>> t3      sends binary
>> 
>> 
>> If i open the client socket in blocking mode, the second variation seems
>> to work (the second read just blocks as desired).
>> When using only one socket, i can do without socketSelect(), but
>> i have the follwoing questions:
>> 
>> 1. Can i be sure, the the blocking variation will also work for larger
>> data sets, when e.g. the server starts writing before the client is
>> reading?
>> 
>> 2. How could i proceed, if i needed several sockets?
>> Then i cannot use socketSelect due to the problem described in
>> variation 1.
>> I also cannot use blocking sockets, since reading from an empty socket
>> would block the others.
>> Without blocking and socketSelect(), i might run into the race condition
>> described above.
>> 
>> In any case, the readBin() crash with endian="big" is a bug in
>> my eyes. For non-blocking sockets, readBin() should just return numeric(0),
>> if no data are written on the socket.
>> I also stronlgy suspect that the socketSelect() behaviour as described in
>> variation 1 is a bug.
>
> Thanks for the report and the examples.  Both issues are bugs.
>
> The crash is due to the fact that a low level routine
> (sock_read_helper) correctly marks the connection as incomplete and
> returns -EAGAIN as its result but the next higher routine (sock_read)
> treats the result as a character count, unsigns it on return, and bad
> tings happen the third level up (do_readbin).  I'm not quite sure
> whether the best fix is to change sock_read_helper to return 0 or to
> have sock_read to do some checking on the result it gets from
> sock_read.
>
> The issue with socketSelect is that socketSelect ought to return
> immediately if buffered input is available but it does not.  As a
> result, when you execute both writes before the first read then the
> read will read all available input and store the part it does not use;
> socketSelect then waits for _additional_ input which never comes.
> This should be fixed in R-devel soon.
>
> I always use blocking reads and writes with sockets--its a lot easier
> than trying to figure out how to deal with incomplete reads or writes.
> You need to make sure to use a protocol that guarantees that a reader
> will read what a writer writes before the writer needs to move on.  If
> you don't then you get deadlock with blocking writes and data large
> enough to fill the buffer.  Using non-blocking sockets doesn't cure
> the problem, it just changes the symptoms.
>
> I use socketSelect in the cocket version of my snow package for the
> load balaned cluster apply to detect the first slave to finish its
> work.  In my setup the final write/read pairs in each communication
> exchange are binary.  With the current implementation this ensures
> that the read completely empties the buffer and so this problem does
> not bite.  It sounds like the same stategy should allow you to work
> with the current implementation.
>

Both the socketSelect and the segfault in reading from non-blockign
sockets are now fixed in R-devel.

Best,

luke

-- 
Luke Tierney
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu




More information about the R-help mailing list