[Rd] Is it a good choice to increase the NCONNECTION value?

GILLIBERT, Andre Andre@G||||bert @end|ng |rom chu-rouen@|r
Wed Aug 25 23:00:56 CEST 2021


Hello,


The soft limit to the number of file descriptors is 1024 on GNU/Linux but the default hard limit is at 1048576 or 524288 on major modern distributions : Ubuntu, Fedora, Debian.

I do not have access to a Macintosh, but it looks like the soft limit is 256 and hard limit is "unlimited", though actually, the real hard limit has been reported as 10240 (https://developer.r-project.org/Blog/public/2018/03/23/maximum-number-of-dlls/index.html).


Therefore, R should easily be able to change the limit without superuser privileges, with a call to setrlimit().

This should make file descriptor exhaustion very unlikely, except for buggy programs leaking file descriptors.


The simplest approach would be to set the soft limit to the value of the hard limit. Maybe to be nicer, R could set it to 10000 (or the hard limit if lower), which should be enough for intensive uses but would not use too much system resources in case of file descriptor leaks.


To get R reliably work in more esoteric operating systems or in poorly configured systems (e.g. systems with a hard limit at 1024), a second security could be added: a request of a new connection would be denied if the actual number of open file descriptors (or connections if that is easier to compute) is too close to the hard limit. A fixed amount (e.g. 128) or a proportion (e.g. 25%) of file descriptors would be reserved for "other uses", such as shared libraries.


This discussion reminds me of the fixed number of file descriptors of MS-DOS, defined at boot time in config.sys (e.g. files=20).

This is incredible that 64 bits computers in 2021 with gigabytes of RAM still have similar limits, and that R, has a hard-coded limit at 128.


--

Sincerely

André GILLIBERT

________________________________
De : qweytr1 using mail.ustc.edu.cn <qweytr1 using mail.ustc.edu.cn>
Envoyé : mercredi 25 août 2021 06:15:59
À : Simon Urbanek
Cc : Martin Maechler; GILLIBERT, Andre; R-devel
Objet : 回复: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value?

ATTENTION: Cet e-mail provient d’une adresse mail extérieure au CHU de Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes à moins de connaître l'expéditeur et de savoir que le contenu est sûr. En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse. Merci de votre vigilance


Simon,

What about using a dynamically allocated connections and a modifiable MAX_NCONNECTIONS limit?
ulimit could be modified by root users, at least now NCONNECTION could not.

I tried changing the program using malloc and realloc to allocate memory, due to unfamiliar with `.Internal` calls, I could not provide a function that modify the MAX_NCONNECTIONS (but it is possible.)
test and changes are shown below. I'll be appperciate if you could tell me whether there could be a bug.

(a demo that may change MAX_NCONNECTIONS, not tested.)
static int SetMaxNconnections(int now){ // return current value of MAX_NCONNECTIONS
  if(now<3)error(_("Could not shrink the MAX_NCONNECTIONS less than 3"));
  if(now>65536)warning(_("Setting MAX_NCONNECTIONS=%d, larger than 65536, may be crazy. Use at your own risk."),now);
  // setting MAX_NCONNECTIONS to a really large value is safe, since the allocation is not done immediately. Thus this is a warning.
  if(now>=NCONNECTIONS)return MAX_NCONNECTIONS=now; // if now is larger than NCONNECTIONS<=now,MAX_NCONNECTIONS, thus it is safe.
  R_gc(); /* Try to reclaim unused connections */
  for(int i=NCONNECTIONS;i>=now;--i){// now >= 3 here, thus no underflow occurs.
    // shrink the value of MAX_NCONNECTIONS and NCONNECTIONS
    if(!Connections[i]){now=i+1;break;}
  }
  // here, we could call a realloc, since *Connections only capture several kilobytes, realloc seems meaningless.
  // a true realloc will trigger if NCONNECTIONS<MAX_NCONNECTIONS and call NextConnection with all connections are in use
  return MAX_NCONNECTIONS=NCONNECTIONS=now;
}



test result:

$ LC_ALL=C R-4.1.1/bin/R -q -e 'library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))'
WARNING: ignoring environment value of R_HOME
> library(doParallel);cl=makeForkCluster(128);max(sapply(clusterCall(cl,function()runif(10)),"+"))
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
  increase max connections from 16 to 32
2: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
  increase max connections from 32 to 64
3: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
  increase max connections from 64 to 128
4: In socketAccept(socket = socket, blocking = TRUE, open = "a+b",  :
  increase max connections from 128 to 256
[1] 0.9975836
>
>


tested changes:


~line 127

static int NCONNECTIONS=16; /* need one per cluster node, 16 is the
  initial value which grows dynamically */
static int MAX_NCONNECTIONS=8192; /* increase it only affect the speed of
  finding the correct connection, if you have a machine with more than
  4096 threads, you could submit an issue or modify this value manually */
#define NSINKS 21

static Rconnection *Connections=NULL; /* we will allocate it later */
...

~line 146



static int NextConnection(void)
{
    int i;
    for(i = 3; i < NCONNECTIONS; i++)
    if(!Connections[i]) break;
    if(i >= NCONNECTIONS) {
    R_gc(); /* Try to reclaim unused ones */
    for(i = 3; i < NCONNECTIONS; i++)
        if(!Connections[i]) break;
    if(i >= NCONNECTIONS) {
        if(i >= MAX_NCONNECTIONS)
        error(_("all connections are in use"));
        int new_connections=NCONNECTIONS*2;//try dynamic alloc
        if(new_connections > MAX_NCONNECTIONS)
        new_connections = MAX_NCONNECTIONS;
        Rconnection*ptr = realloc(Connections,new_connections*sizeof(Rconnection));
        if (ptr==NULL)
        error(_("alloc extra connections failed"));
        warning(_("increase max connections from %d to %d\n"),NCONNECTIONS,new_connections);
        Connections = ptr;
        NCONNECTIONS = new_connections;
        for(int j = i; j < NCONNECTIONS; j++) Connections[j] = NULL;
    }
    }
    return i;
}
...



~line 5265

void attribute_hidden InitConnections()
{
    int i;
    Connections=malloc(NCONNECTIONS*sizeof(Rconnection));
    if(Connections == NULL) {
    error(_("Cannot alloc connections."));
    abort();
    }
...


> -----原始邮件-----
> 发件人: "Simon Urbanek" <simon.urbanek using R-project.org>
> 发送时间: 2021-08-25 08:25:47 (星期三)
> 收件人: "Martin Maechler" <maechler using stat.math.ethz.ch>
> 抄送: "GILLIBERT, Andre" <Andre.Gillibert using chu-rouen.fr>, "qweytr1 using mail.ustc.edu.cn" <qweytr1 using mail.ustc.edu.cn>, R-devel <R-devel using r-project.org>
> 主题: [SPAM] Re: [Rd] Is it a good choice to increase the NCONNECTION value?
>
> Martin,
>
> I don't think static connection limit is sensible. Recall that connections can be anything, not just necessarily sockets or file descriptions so they are not linked to the system fd limit. For example, if you use a codec then you will need twice the number of connections than the fds. To be honest the connection limit is one of the main reasons why in our big data applications we have always avoided R connections and used C-level sockets instead (others were lack of control over the socket flags, but that has been addressed in the last release). So I'd vote for at the very least increasing the limit significantly (at least 1k if not more) and, ideally, make it dynamic if memory footprint is an issue.
>
> Cheers,
> Simon
>
>
> > On Aug 25, 2021, at 8:53 AM, Martin Maechler <maechler using stat.math.ethz.ch> wrote:
> >
> >>>>>> GILLIBERT, Andre
> >>>>>>    on Tue, 24 Aug 2021 09:49:52 +0000 writes:
> >
> >> RConnection is a pointer to a Rconn structure. The Rconn
> >> structure must be allocated independently (e.g. by
> >> malloc() in R_new_custom_connection).  Therefore,
> >> increasing NCONNECTION to 1024 should only use 8
> >> kilobytes on 64-bits platforms and 4 kilobytes on 32
> >> bits platforms.
> >
> > You are right indeed, and I was wrong.
> >
> >> Ideally, it should be dynamically allocated : either as
> >> a linked list or as a dynamic array
> >> (malloc/realloc). However, a simple change of
> >> NCONNECTION to 1024 should be enough for most uses.
> >
> > There is one important other problem I've been made aware
> > (similarly to the number of open DLL libraries, an issue 1-2
> > years ago) :
> >
> > The OS itself has limits on the number of open files
> > (yes, I know that there are other connections than files) and
> > these limits may quite differ from platform to platform.
> >
> > On my Linux laptop, in a shell, I see
> >
> >  $ ulimit -n
> >  1024
> >
> > which is barely conformant with your proposed 1024 NCONNECTION.
> >
> > Now if NCONNCECTION is larger than the max allowed number of
> > open files and if R opens more files than the OS allowed, the
> > user may get quite unpleasant behavior, e.g. R being terminated brutally
> > (or behaving crazily) without good R-level warning / error messages.
> >
> > It's also not at all sufficient to check for the open files
> > limit at compile time, but rather at R process startup time
> >
> > So this may need considerably more work than you / we have
> > hoped, and it's probably hard to find a safe number that is
> > considerably larger than 128  and less than the smallest of all
> > non-crazy platforms' {number of open files limit}.
> >
> >> Sincerely
> >> Andr� GILLIBERT
> >
> >  [............]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list