[Rd] default min-v/nsize parameters

Henrik Bengtsson hb at biostat.ucsf.edu
Tue Jan 20 19:58:08 CET 2015


Thanks for this.

Anyone know how I can find what those initial settings are from within
R?  Do I need to parse/look at both environment variables R_NSIZE and
R_VSIZE and then commandArgs()?

/Henrik

On Tue, Jan 20, 2015 at 1:42 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>>>>>> Peter Haverty <haverty.peter at gene.com>
>>>>>>     on Mon, 19 Jan 2015 08:50:08 -0800 writes:
>
>     > Hi All, This is a very important issue. It would be very
>     > sad to leave most users unaware of a free speedup of this
>     > size.  These options don't appear in the R --help
>     > output. They really should be added there.
>
> Indeed, I've found that myself and had added them there about
> 24 hours ago.
> ((I think they were accidentally dropped a while ago))
>
>     > if the garbage collector is working very hard, might it
>     > emit a note about better setting for these variables?
>
>     > It's not really my place to comment on design philosophy,
>     > but if there is a configure option for small memory
>     > machines I would assume that would be sufficient for the
>     > folks that are not on fairly current hardware.
>
> There's quite a few more issues with this,
> notably how the growth *steps* are done.
> That has been somewhat experimental and for that reason is
> _currently_ quite configurable via R_GC_* environment variables,
> see the code in src/main/memory.c
>
> This is currently discussed "privately" within the R core.
> I'm somewhat confident that R 3.2.0 in April will have changes.
>
> And -- coming back to the beginning -- at least the "R-devel" version now shows
>
> R --help | grep -e min-.size
>
>   --min-nsize=N         Set min number of fixed size obj's ("cons cells") to N
>   --min-vsize=N         Set vector heap minimum to N bytes; '4M' = 4 MegaB
>
> --
> Martin Maechler, ETH Zurich
>
>     > On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <nate at verse.com> wrote:
>
>     >> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
>     >> <lawrence.michael at gene.com> wrote:
>     >> > Just wanted to start a discussion on whether R could ship with more
>     >> > appropriate GC parameters.
>     >>
>     >> I've been doing a number of similar measurements, and have come to the
>     >> same conclusion.  R is currently very conservative about memory usage,
>     >> and this leads to unnecessarily poor performance on certain problems.
>     >> Changing the defaults to sizes that are more appropriate for modern
>     >> machines can often produce a 2x speedup.
>     >>
>     >> On Sat, Jan 17, 2015 at 8:39 AM,  <luke-tierney at uiowa.edu> wrote:
>     >> > Martin Morgan discussed this a year or so ago and as I recall bumped
>     >> > up these values to the current defaults. I don't recall details about
>     >> > why we didn't go higher -- maybe Martin does.
>     >>
>     >> I just checked, and it doesn't seem that any of the relevant values
>     >> have been increased in the last ten years.  Do you have a link to the
>     >> discussion you recall so we can see why the changes weren't made?
>     >>
>     >> > I suspect the main concern would be with small memory machines in
>     >> student labs
>     >> > and less developed countries.
>     >>
>     >> While a reasonable concern, I'm doubtful there are many machines for
>     >> which the current numbers are optimal.  The current minimum size
>     >> increases for node and vector heaps are 40KB and 80KB respectively.
>     >> This grows as the heap grows (min + .05 * heap), but still means that
>     >> we do many more expensive garbage collections at while growing than we
>     >> need to.  Paradoxically, the SMALL_MEMORY compile option (which is
>     >> suggestd for computers with up to 32MB of RAM) has slightly larger at
>     >> 50KB and 100KB.
>     >>
>     >> I think we'd get significant benefit for most users by being less
>     >> conservative about memory consumption.    The exact sizes should be
>     >> discussed, but with RAM costing about $10/GB it doesn't seem
>     >> unreasonable to assume most machines running R have multiple GB
>     >> installed, and those that don't will quite likely be running an OS
>     >> that needs a custom compiled binary anyway.
>     >>
>     >> I could be way off, but my suggestion might be a 10MB start with 1MB
>     >> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
>     >> for NORMAL_MEMORY, and 1GB start with 100MB increments for
>     >> LARGE_MEMORY might be a reasonable spread.
>     >>
>     >> Or one could go even larger, noting that on most systems,
>     >> overcommitted memory is not a problem until it is used.  Until we
>     >> write to it, it doesn't actually use physical RAM, just virtual
>     >> address space.  Or we could stay small, but make it possible to
>     >> programmatically increase the granularity from within R.
>     >>
>     >> For ease of reference, here are the relevant sections of code:
>     >>
>     >> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
>     >> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
>     >> 217  #ifndef R_NSIZE
>     >> 218  #define R_NSIZE 350000L
>     >> 219  #endif
>     >> 220  #ifndef R_VSIZE
>     >> 221  #define R_VSIZE 6291456L
>     >> 222  #endif
>     >>
>     >> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
>     >> (ripley last authored on Jun 9, 2004)
>     >> 157 Rp->vsize = R_VSIZE;
>     >> 158 Rp->nsize = R_NSIZE;
>     >> 166  #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
>     >> 167  #define Max_Vsize R_SIZE_T_MAX /* unlimited */
>     >> 169  #define Min_Nsize 220000
>     >> 170  #define Min_Vsize (1*Mega)
>     >>
>     >> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
>     >> (luke last authored on Nov 1, 2000)
>     >> #ifdef SMALL_MEMORY
>     >> 336  /* On machines with only 32M of memory (or on a classic Mac OS port)
>     >> 337      it might be a good idea to use settings like these that are more
>     >> 338      aggressive at keeping memory usage down. */
>     >> 339  static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
>     >> 340  static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0;
>     >> 341  static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
>     >> 342  static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0;
>     >> 343#else
>     >> 344  static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
>     >> 345  static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0;
>     >> 346  static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
>     >> 347  static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0;
>     >> 348#endif
>     >>
>     >> static void AdjustHeapSize(R_size_t size_needed)
>     >> {
>     >> R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
>     >> R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
>     >> R_size_t NNeeded = R_NodesInUse + R_MinNFree;
>     >> R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
>     >> size_needed + R_MinVFree;
>     >> double node_occup = ((double) NNeeded) / R_NSize;
>     >> double vect_occup = ((double) VNeeded) / R_VSize;
>     >>
>     >> if (node_occup > R_NGrowFrac) {
>     >> R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
>     >> * R_NSize);
>     >> if (R_MaxNSize >= R_NSize + change)
>     >> R_NSize += change;
>     >> }
>     >> else if (node_occup < R_NShrinkFrac) {
>     >> R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
>     >> if (R_NSize < NNeeded)
>     >> R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize;
>     >> if (R_NSize < orig_R_NSize)
>     >> R_NSize = orig_R_NSize;
>     >> }
>     >>
>     >> if (vect_occup > 1.0 && VNeeded < R_MaxVSize)
>     >> R_VSize = VNeeded;
>     >> if (vect_occup > R_VGrowFrac) {
>     >> R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac
>     >> * R_VSize);
>     >> if (R_MaxVSize - R_VSize >= change)
>     >> R_VSize += change;
>     >> }
>     >> else if (vect_occup < R_VShrinkFrac) {
>     >> R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize;
>     >> if (R_VSize < VNeeded)
>     >> R_VSize = VNeeded;
>     >> if (R_VSize < orig_R_VSize)
>     >> R_VSize = orig_R_VSize;
>     >> }
>     >>
>     >> DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup);
>     >> }
>     >>
>     Rp-> nsize is overridden at startup by environment variable R_NSIZE if
>     >> Min_Nsize <= $R_NSIZE <= Max_Nsize.  Rp->vsize is overridden at
>     >> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <=
>     >> Max_Vsize.  These are then used to set the global variables R_Nsize
>     >> and R_Vsize with R_SetMaxVSize(Rp->max_vsize).
>     >>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list