[Rd] default min-v/nsize parameters

Peter Haverty haverty.peter at gene.com
Mon Jan 19 17:50:08 CET 2015


Hi All,

This is a very important issue. It would be very sad to leave most users
unaware of a free speedup of this size.  These options don't appear in the
R --help output. They really should be added there. Additionally, if the
garbage collector is working very hard, might it emit a note about better
setting for these variables?

It's not really my place to comment on design philosophy, but if there is a
configure option for small memory machines I would assume that would be
sufficient for the folks that are not on fairly current hardware.

Regards,


Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Sat, Jan 17, 2015 at 11:40 PM, Nathan Kurz <nate at verse.com> wrote:

> On Thu, Jan 15, 2015 at 3:55 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
> > Just wanted to start a discussion on whether R could ship with more
> > appropriate GC parameters.
>
> I've been doing a number of similar measurements, and have come to the
> same conclusion.  R is currently very conservative about memory usage,
> and this leads to unnecessarily poor performance on certain problems.
> Changing the defaults to sizes that are more appropriate for modern
> machines can often produce a 2x speedup.
>
> On Sat, Jan 17, 2015 at 8:39 AM,  <luke-tierney at uiowa.edu> wrote:
> > Martin Morgan discussed this a year or so ago and as I recall bumped
> > up these values to the current defaults. I don't recall details about
> > why we didn't go higher -- maybe Martin does.
>
> I just checked, and it doesn't seem that any of the relevant values
> have been increased in the last ten years.  Do you have a link to the
> discussion you recall so we can see why the changes weren't made?
>
> > I suspect the main concern would be with small memory machines in
> student labs
> > and less developed countries.
>
> While a reasonable concern, I'm doubtful there are many machines for
> which the current numbers are optimal.  The current minimum size
> increases for node and vector heaps are 40KB and 80KB respectively.
> This grows as the heap grows (min + .05 * heap), but still means that
> we do many more expensive garbage collections at while growing than we
> need to.  Paradoxically, the SMALL_MEMORY compile option (which is
> suggestd for computers with up to 32MB of RAM) has slightly larger at
> 50KB and 100KB.
>
> I think we'd get significant benefit for most users by being less
> conservative about memory consumption.    The exact sizes should be
> discussed, but with RAM costing about $10/GB it doesn't seem
> unreasonable to assume most machines running R have multiple GB
> installed, and those that don't will quite likely be running an OS
> that needs a custom compiled binary anyway.
>
> I could be way off, but my suggestion might be a 10MB start with 1MB
> minimum increments for SMALL_MEMORY, 100MB start with 10MB increments
> for NORMAL_MEMORY, and 1GB start with 100MB increments for
> LARGE_MEMORY might be a reasonable spread.
>
> Or one could go even larger, noting that on most systems,
> overcommitted memory is not a problem until it is used.  Until we
> write to it, it doesn't actually use physical RAM, just virtual
> address space.  Or we could stay small, but make it possible to
> programmatically increase the granularity from within R.
>
> For ease of reference, here are the relevant sections of code:
>
> https://github.com/wch/r-source/blob/master/src/include/Defn.h#L217
> (ripley last authored on Jan 26, 2000 / pd last authored on May 8, 1999)
> 217  #ifndef R_NSIZE
> 218  #define R_NSIZE 350000L
> 219  #endif
> 220  #ifndef R_VSIZE
> 221  #define R_VSIZE 6291456L
> 222  #endif
>
> https://github.com/wch/r-source/blob/master/src/main/startup.c#L169
> (ripley last authored on Jun 9, 2004)
> 157 Rp->vsize = R_VSIZE;
> 158 Rp->nsize = R_NSIZE;
> 166  #define Max_Nsize 50000000 /* about 1.4Gb 32-bit, 2.8Gb 64-bit */
> 167  #define Max_Vsize R_SIZE_T_MAX /* unlimited */
> 169  #define Min_Nsize 220000
> 170  #define Min_Vsize (1*Mega)
>
> https://github.com/wch/r-source/blob/master/src/main/memory.c#L335
> (luke last authored on Nov 1, 2000)
> #ifdef SMALL_MEMORY
> 336  /* On machines with only 32M of memory (or on a classic Mac OS port)
> 337      it might be a good idea to use settings like these that are more
> 338      aggressive at keeping memory usage down. */
> 339  static double R_NGrowIncrFrac = 0.0, R_NShrinkIncrFrac = 0.2;
> 340  static int R_NGrowIncrMin = 50000, R_NShrinkIncrMin = 0;
> 341  static double R_VGrowIncrFrac = 0.0, R_VShrinkIncrFrac = 0.2;
> 342  static int R_VGrowIncrMin = 100000, R_VShrinkIncrMin = 0;
> 343#else
> 344  static double R_NGrowIncrFrac = 0.05, R_NShrinkIncrFrac = 0.2;
> 345  static int R_NGrowIncrMin = 40000, R_NShrinkIncrMin = 0;
> 346  static double R_VGrowIncrFrac = 0.05, R_VShrinkIncrFrac = 0.2;
> 347  static int R_VGrowIncrMin = 80000, R_VShrinkIncrMin = 0;
> 348#endif
>
> static void AdjustHeapSize(R_size_t size_needed)
> {
>     R_size_t R_MinNFree = (R_size_t)(orig_R_NSize * R_MinFreeFrac);
>     R_size_t R_MinVFree = (R_size_t)(orig_R_VSize * R_MinFreeFrac);
>     R_size_t NNeeded = R_NodesInUse + R_MinNFree;
>     R_size_t VNeeded = R_SmallVallocSize + R_LargeVallocSize +
> size_needed + R_MinVFree;
>     double node_occup = ((double) NNeeded) / R_NSize;
>     double vect_occup = ((double) VNeeded) / R_VSize;
>
>     if (node_occup > R_NGrowFrac) {
>         R_size_t change = (R_size_t)(R_NGrowIncrMin + R_NGrowIncrFrac
> * R_NSize);
>         if (R_MaxNSize >= R_NSize + change)
>            R_NSize += change;
>     }
>     else if (node_occup < R_NShrinkFrac) {
>         R_NSize -= (R_NShrinkIncrMin + R_NShrinkIncrFrac * R_NSize);
>         if (R_NSize < NNeeded)
>              R_NSize = (NNeeded < R_MaxNSize) ? NNeeded: R_MaxNSize;
>         if (R_NSize < orig_R_NSize)
>              R_NSize = orig_R_NSize;
>      }
>
>     if (vect_occup > 1.0 && VNeeded < R_MaxVSize)
>         R_VSize = VNeeded;
>     if (vect_occup > R_VGrowFrac) {
>         R_size_t change = (R_size_t)(R_VGrowIncrMin + R_VGrowIncrFrac
> * R_VSize);
>         if (R_MaxVSize - R_VSize >= change)
>              R_VSize += change;
>     }
>     else if (vect_occup < R_VShrinkFrac) {
>         R_VSize -= R_VShrinkIncrMin + R_VShrinkIncrFrac * R_VSize;
>         if (R_VSize < VNeeded)
>            R_VSize = VNeeded;
>         if (R_VSize < orig_R_VSize)
>            R_VSize = orig_R_VSize;
>     }
>
>     DEBUG_ADJUST_HEAP_PRINT(node_occup, vect_occup);
> }
>
> Rp->nsize is overridden at startup by environment variable R_NSIZE if
> Min_Nsize <= $R_NSIZE <= Max_Nsize.  Rp->vsize is overridden at
> startup by environment variable R_VSIZE if Min_Vsize <= $R_VSIZE <=
> Max_Vsize.  These are then used to set the global variables R_Nsize
> and R_Vsize with R_SetMaxVSize(Rp->max_vsize).
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list