[Rd] Is it a good choice to increase the NCONNECTION value?

GILLIBERT, Andre Andre@G||||bert @end|ng |rom chu-rouen@|r
Thu Aug 26 19:31:56 CEST 2021


> as stated earlier, R already uses setrlimit() to raise the limit (see my earlier reply).


Currently, setrlimit() is called by R_EnsureFDLimit() and the latter is called by initLoadedDLL() in Rdynload.c depending on the R_MAX_NUM_DLLS environment variable.

R_MAX_NUM_DLLS can be at most 1000 and the ulimit is raised to ceil(R_MAX_NUM_DLLS/0.6), which is at most 1667.

That seems pretty low to me.


I would not mind calling R_EnsureFDLimit(10000) unconditionnally. Ten thousand file descriptors should be "cheap" enough in system resources on any modern system (probably less than hundred megabytes even if kernel structures are large), and should be enough for moderate-to-intensive uses.

> As for "special" connections, that is not feasible (without some serious re-write), since the connection doesn't know what it is used for and connections are not the only

> way descriptors may be used.


I was thinking about something like /proc/$PID/fd on Linux to enumerate file descriptors of the current process, independently of who created them.

However, I did not find a fast and portable way of doing that. If thousands of file descriptors are open, we cannot afford to do thousands of system calls every time a new connection is created.


Anyway, I wrote and tested a patch with the following features:

1) Dynamic allocation of the Connections array with a MAX_NCONNECTIONS limit

2) MAX_NCONNECTIONS defaults to 8192

3) MAX_NCONNECTIONS can be set at startup by an environment variable R_MAX_NCONNECTIONS

4) MAX_NCONNECTIONS can be read and changed at run time by the options("max.n.connections")

5) R_EnsureFDLimit(10000) is called unconditionnally at startup


--

Sincerely

André GILLIBERT

________________________________
De : Simon Urbanek <simon.urbanek using R-project.org>
Envoyé : jeudi 26 août 2021 01:27:51
À : GILLIBERT, Andre
Cc : qweytr1 using mail.ustc.edu.cn; R-devel; Martin Maechler
Objet : Re: [Rd] Is it a good choice to increase the NCONNECTION value?

ATTENTION: Cet e-mail provient d’une adresse mail extérieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes à moins de connaître l'expéditeur et de savoir que le contenu est sûr. En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse. Merci de votre vigilance


Andre,

as stated earlier, R already uses setrlimit() to raise the limit (see my earlier reply).

As for "special" connections, that is not feasible (without some serious re-write), since the connection doesn't know what it is used for and connections are not the only way descriptors may be used.

Anyway, I think the take away was that likely the best way forward is to make it configurable at startup time with possible option to check that value against the feasibility of open connections.

Cheers,
Simon



> On Aug 26, 2021, at 9:00 AM, GILLIBERT, Andre <Andre.Gillibert using chu-rouen.fr> wrote:
>
> Hello,
>
>
> The soft limit to the number of file descriptors is 1024 on GNU/Linux but the default hard limit is at 1048576 or 524288 on major modern distributions : Ubuntu, Fedora, Debian.
>
> I do not have access to a Macintosh, but it looks like the soft limit is 256 and hard limit is "unlimited", though actually, the real hard limit has been reported as 10240 (https://developer.r-project.org/Blog/public/2018/03/23/maximum-number-of-dlls/index.html).
>
>
> Therefore, R should easily be able to change the limit without superuser privileges, with a call to setrlimit().
>
> This should make file descriptor exhaustion very unlikely, except for buggy programs leaking file descriptors.
>
>
> The simplest approach would be to set the soft limit to the value of the hard limit. Maybe to be nicer, R could set it to 10000 (or the hard limit if lower), which should be enough for intensive uses but would not use too much system resources in case of file descriptor leaks.
>
>
> To get R reliably work in more esoteric operating systems or in poorly configured systems (e.g. systems with a hard limit at 1024), a second security could be added: a request of a new connection would be denied if the actual number of open file descriptors (or connections if that is easier to compute) is too close to the hard limit. A fixed amount (e.g. 128) or a proportion (e.g. 25%) of file descriptors would be reserved for "other uses", such as shared libraries.
>
>
> This discussion reminds me of the fixed number of file descriptors of MS-DOS, defined at boot time in config.sys (e.g. files=20).
>
> This is incredible that 64 bits computers in 2021 with gigabytes of RAM still have similar limits, and that R, has a hard-coded limit at 128.
>
>
> --
>
> Sincerely
>
> André GILLIBERT
>
> ________________________________
> De : qweytr1 using mail.ustc.edu.cn <qweytr1 using mail.ustc.edu.cn>
> Envoyé : mercredi 25 août 2021 06:15:59
> À : Simon Urbanek
> Cc : Martin Maechler; GILLIBERT, Andre; R-devel
> Objet : 回复: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value?
>
> ATTENTION: Cet e-mail provient d’une adresse mail extérieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes à moins de connaître l'expéditeur et de savoir que le contenu est sûr. En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse. Merci de votre vigilance
>
>
> Simon,
>
> What about using a dynamically allocated connections and a modifiable MAX_NCONNECTIONS limit?
> ulimit could be modified by root users, at least now NCONNECTION could not.
>
> I tried changing the program using malloc and realloc to allocate memory, due to unfamiliar with `.Internal` calls, I could not provide a function that modify the MAX_NCONNECTIONS (but it is possible.)
> test and changes are shown below. I'll be appperciate if you could tell me whether there could be a bug.
>
> (a demo that may change MAX_NCONNECTIONS, not tested.)
> static int SetMaxNconnections(int now){ // return current value of MAX_NCONNECTIONS
>  if(now<3)error(_("Could not shrink the MAX_NCONNECTIONS less than 3"));
>  if(now>65536)warning(_("Setting MAX_NCONNECTIONS=%d, larger than 65536, may be crazy. Use at your own risk."),now);
>  // setting MAX_NCONNECTIONS to a really large value is safe, since the allocation is not done immediately. Thus this is a warning.
>  if(now>=NCONNECTIONS)return MAX_NCONNECTIONS=now; // if now is larger than NCONNECTIONS<=now,MAX_NCONNECTIONS, thus it is safe.
>  R_gc(); /* Try to reclaim unused connections */
>  for(int i=NCONNECTIONS;i>=now;--i){// now >= 3 here, thus no underflow occurs.
>    // shrink the value of MAX_NCONNECTIONS and NCONNECTIONS
>    if(!Connections[i]){now=i+1;break;}
>  }
>  // here, we could call a realloc, since *Connections only capture several kilobytes, realloc seems meaningless.
>  // a true realloc will trigger if NCONNECTIONS<MAX_NCONNECTIONS and call NextConnection with all connections are in use
>  return MAX_NCONNECTIONS=NCONNECTIONS=now;
> }
>
>
>
> test result:
>
> $ LC_ALL=C R-4.1.1/bin/R -q -e 'library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))'
> WARNING: ignoring environment value of R_HOME
>> library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))
> Loading required package: foreach
> Loading required package: iterators
> Loading required package: parallel
> Warning messages:
> 1: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
>  increase max connections from 16 to 32
> 2: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
>  increase max connections from 32 to 64
> 3: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
>  increase max connections from 64 to 128
> 4: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
>  increase max connections from 128 to 256
> [1] 0.9975836
>>
>>
>
>
> tested changes:
>
>
> ~line 127
>
> static int NCONNECTIONS=16; /* need one per cluster node, 16 is the
>  initial value which grows dynamically */
> static int MAX_NCONNECTIONS=8192; /* increase it only affect the speed of
>  finding the correct connection, if you have a machine with more than
>  4096 threads, you could submit an issue or modify this value manually */
> #define NSINKS 21
>
> static Rconnection *Connections=NULL; /* we will allocate it later */
> ...
>
> ~line 146
>
>
>
> static int NextConnection(void)
> {
>    int i;
>    for(i = 3; i < NCONNECTIONS; i++)
>    if(!Connections[i]) break;
>    if(i >= NCONNECTIONS) {
>    R_gc(); /* Try to reclaim unused ones */
>    for(i = 3; i < NCONNECTIONS; i++)
>        if(!Connections[i]) break;
>    if(i >= NCONNECTIONS) {
>        if(i >= MAX_NCONNECTIONS)
>        error(_("all connections are in use"));
>        int new_connections=NCONNECTIONS*2;//try dynamic alloc
>        if(new_connections > MAX_NCONNECTIONS)
>        new_connections = MAX_NCONNECTIONS;
>        Rconnection*ptr = realloc(Connections,new_connections*sizeof(Rconnection));
>        if (ptr==NULL)
>        error(_("alloc extra connections failed"));
>        warning(_("increase max connections from %d to %d\n"),NCONNECTIONS,new_connections);
>        Connections = ptr;
>        NCONNECTIONS = new_connections;
>        for(int j = i; j < NCONNECTIONS; j++) Connections[j] = NULL;
>    }
>    }
>    return i;
> }
> ...
>
>
>
> ~line 5265
>
> void attribute_hidden InitConnections()
> {
>    int i;
>    Connections=malloc(NCONNECTIONS*sizeof(Rconnection));
>    if(Connections == NULL) {
>    error(_("Cannot alloc connections."));
>    abort();
>    }
> ...
>
>
>> -----原始邮件-----
>> 发件人: "Simon Urbanek" <simon.urbanek using R-project.org>
>> 发送时间: 2021-08-25 08:25:47 (星期三)
>> 收件人: "Martin Maechler" <maechler using stat.math.ethz.ch>
>> 抄送: "GILLIBERT, Andre" <Andre.Gillibert using chu-rouen.fr>, "qweytr1 using mail.ustc.edu.cn" <qweytr1 using mail.ustc.edu.cn>, R-devel <R-devel using r-project.org>
>> 主题: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value?
>>
>> Martin,
>>
>> I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue.
>>
>> Cheers,
>> Simon
>>
>>
>>> On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>>>
>>>>>>>> GILLIBERT, Andre
>>>>>>>>   on Tue, 24 Aug 2021 09:49:52 +0000 writes:
>>>
>>>> RConnection is a pointer to a Rconn structure. The Rconn
>>>> structure must be allocated independently (e.g. by
>>>> malloc() in R_new_custom_connection).  Therefore,
>>>> increasing NCONNECTION to 1024 should only use 8
>>>> kilobytes on 64-bits platforms and 4 kilobytes on 32
>>>> bits platforms.
>>>
>>> You are right indeed, and I was wrong.
>>>
>>>> Ideally, it should be dynamically allocated : either as
>>>> a linked list or as a dynamic array
>>>> (malloc/realloc). However, a simple change of
>>>> NCONNECTION to 1024 should be enough for most uses.
>>>
>>> There is one important other problem I've been made aware
>>> (similarly to the number of open DLL libraries, an issue 1-2
>>> years ago) :
>>>
>>> The OS itself has limits on the number of open files
>>> (yes, I know that there are other connections than files) and
>>> these limits may quite differ from platform to platform.
>>>
>>> On my Linux laptop, in a shell, I see
>>>
>>> $ ulimit -n
>>> 1024
>>>
>>> which is barely conformant with your proposed 1024 NCONNECTION.
>>>
>>> Now if NCONNCECTION is larger than the max allowed number of
>>> open files and if R opens more files than the OS allowed, the
>>> user may get quite unpleasant behavior, e.g. R being terminated brutally
>>> (or behaving crazily) without good R-level warning / error messages.
>>>
>>> It's also not at all sufficient to check for the open files
>>> limit at compile time, but rather at R process startup time
>>>
>>> So this may need considerably more work than you / we have
>>> hoped, and it's probably hard to find a safe number that is
>>> considerably larger than 128  and less than the smallest of all
>>> non-crazy platforms' {number of open files limit}.
>>>
>>>> Sincerely
>>>> Andr� GILLIBERT
>>>
>>> [............]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel




More information about the R-devel mailing list