[Rd] default min-v/nsize parameters

Martin Maechler maechler at stat.math.ethz.ch
Tue Jan 20 10:42:27 CET 2015


>>>>> Peter Haverty <haverty.peter at gene.com>
>>>>>     on Mon, 19 Jan 2015 08:50:08 -0800 writes:

    > Hi All, This is a very important issue. It would be very
    > sad to leave most users unaware of a free speedup of this
    > size.  These options don't appear in the R --help
    > output. They really should be added there.

Indeed, I've found that myself and had added them there about
24 hours ago. 
((I think they were accidentally dropped a while ago))

    > if the garbage collector is working very hard, might it
    > emit a note about better setting for these variables?

    > It's not really my place to comment on design philosophy,
    > but if there is a configure option for small memory
    > machines I would assume that would be sufficient for the
    > folks that are not on fairly current hardware.

There's quite a few more issues with this,
notably how the growth *steps* are done.
That has been somewhat experimental and for that reason is
_currently_ quite configurable via R_GC_* environment variables,
see the code in src/main/memory.c

This is currently discussed "privately" within the R core.
I'm somewhat confident that R 3.2.0 in April will have changes.

And -- coming back to the beginning -- at least the "R-devel" version now shows 

R --help | grep -e min-.size

  --min-nsize=N         Set min number of fixed size obj's ("cons cells") to N
  --min-vsize=N         Set vector heap minimum to N bytes; '4M' = 4 MegaB

--
Martin Maechler, ETH Zurich

    > On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <nate at verse.com> wrote:

    >> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
    >> <lawrence.michael at gene.com> wrote:
    >> > Just wanted to start a discussion on whether R could ship with more
    >> > appropriate GC parameters.
    >> 
    >> I've been doing a number of similar measurements, and have come to the
    >> same conclusion.  R is currently very conservative about memory usage,
    >> and this leads to unnecessarily poor performance on certain problems.
    >> Changing the defaults to sizes that are more appropriate for modern
    >> machines can often produce a 2x speedup.
    >> 
    >> On Sat, Jan 17, 2015 at 8:39 AM,  <luke-tierney at uiowa.edu> wrote:
    >> > Martin Morgan discussed this a year or so ago and as I recall bumped
    >> > up these values to the current defaults. I don't recall details about
    >> > why we didn't go higher -- maybe Martin does.
    >> 
    >> I just checked, and it doesn't seem that any of the relevant values
    >> have been increased in the last ten years.  Do you have a link to the
    >> discussion you recall so we can see why the changes weren't made?
    >> 
    >> > I suspect the main concern would be with small memory machines in
    >> student labs
    >> > and less developed countries.
    >> 
    >> While a reasonable concern, I'm doubtful there are many machines for
    >> which the current numbers are optimal.  The current minimum size
    >> increases for node and vector heaps are 40KB and 80KB respectively.
    >> This grows as the heap grows (min + .05 * heap), but still means that
    >> we do many more expensive garbage collections at while growing than we
    >> need to.  Paradoxically, the SMALL_MEMORY compile option (which is
    >> suggestd for computers with up to 32MB of RAM) has slightly larger at
    >> 50KB and 100KB.
    >> 
    >> I think we'd get significant benefit for most users by being less
    >> conservative about memory consumption.    The exact sizes should be
    >> discussed, but with RAM costing about $10/GB it doesn't seem
    >> unreasonable to assume most machines running R have multiple GB
    >> installed, and those that don't will quite likely be running an OS
    >> that needs a custom compiled binary anyway.
    >> 
    >> I could be way off, but my suggestion might be a 10MB start with 1MB
    >> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
    >> for NORMAL_MEMORY, and 1GB start with 100MB increments for
    >> LARGE_MEMORY might be a reasonable spread.
    >> 
    >> Or one could go even larger, noting that on most systems,
    >> overcommitted memory is not a problem until it is used.  Until we
    >> write to it, it doesn't actually use physical RAM, just virtual
    >> address space.  Or we could stay small, but make it possible to
    >> programmatically increase the granularity from within R.
    >> 
    >> For ease of reference, here are the relevant sections of code:
    >> 
    >> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
    >> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
    >> 217  #ifndef R_NSIZE
    >> 218  #define R_NSIZE 350000L
    >> 219  #endif
    >> 220  #ifndef R_VSIZE
    >> 221  #define R_VSIZE 6291456L
    >> 222  #endif
    >> 
    >> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
    >> (ripley last authored on Jun 9, 2004)
    >> 157 Rp->vsize = R_VSIZE;
    >> 158 Rp->nsize = R_NSIZE;
    >> 166  #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
    >> 167  #define Max_Vsize R_SIZE_T_MAX /* unlimited */
    >> 169  #define Min_Nsize 220000
    >> 170  #define Min_Vsize (1*Mega)
    >> 
    >> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
    >> (luke last authored on Nov 1, 2000)
    >> #ifdef SMALL_MEMORY
    >> 336  /* On machines with only 32M of memory (or on a classic Mac OS port)
    >> 337      it might be a good idea to use settings like these that are more
    >> 338      aggressive at keeping memory usage down. */
    >> 339  static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
    >> 340  static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0;
    >> 341  static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
    >> 342  static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0;
    >> 343#else
    >> 344  static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
    >> 345  static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0;
    >> 346  static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
    >> 347  static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0;
    >> 348#endif
    >> 
    >> static void AdjustHeapSize(R_size_t size_needed)
    >> {
    >> R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
    >> R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
    >> R_size_t NNeeded = R_NodesInUse + R_MinNFree;
    >> R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
    >> size_needed + R_MinVFree;
    >> double node_occup = ((double) NNeeded) / R_NSize;
    >> double vect_occup = ((double) VNeeded) / R_VSize;
    >> 
    >> if (node_occup > R_NGrowFrac) {
    >> R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
    >> * R_NSize);
    >> if (R_MaxNSize >= R_NSize + change)
    >> R_NSize += change;
    >> }
    >> else if (node_occup < R_NShrinkFrac) {
    >> R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
    >> if (R_NSize < NNeeded)
    >> R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize;
    >> if (R_NSize < orig_R_NSize)
    >> R_NSize = orig_R_NSize;
    >> }
    >> 
    >> if (vect_occup > 1.0 && VNeeded < R_MaxVSize)
    >> R_VSize = VNeeded;
    >> if (vect_occup > R_VGrowFrac) {
    >> R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac
    >> * R_VSize);
    >> if (R_MaxVSize - R_VSize >= change)
    >> R_VSize += change;
    >> }
    >> else if (vect_occup < R_VShrinkFrac) {
    >> R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize;
    >> if (R_VSize < VNeeded)
    >> R_VSize = VNeeded;
    >> if (R_VSize < orig_R_VSize)
    >> R_VSize = orig_R_VSize;
    >> }
    >> 
    >> DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup);
    >> }
    >> 
    Rp-> nsize is overridden at startup by environment variable R_NSIZE if
    >> Min_Nsize <= $R_NSIZE <= Max_Nsize.  Rp->vsize is overridden at
    >> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <=
    >> Max_Vsize.  These are then used to set the global variables R_Nsize
    >> and R_Vsize with R_SetMaxVSize(Rp->max_vsize).
    >>



More information about the R-devel mailing list