[Rd] --max-vsize

Christophe Rhodes csr21 at cantab.net
Tue Jul 26 11:08:53 CEST 2011


Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:

> Point 1 is as documented: you have exceeded the maximum integer and it
> does say that it gives NA.  So the only 'odd' is reporting that you
> did not read the documentation.

I'm sorry; I thought that my message made it clear that I was aware that
the NA came from exceeding the maximum representable integer.  To
belatedly address the other information I failed to provide, I use R on
Linux, both 32-bit and 64-bit (with 64-bit R).

> Point 2 is R not using the correct units for --max-vsize (it used the
> number of Vcells, as was once documented), and I have fixed.

Thank you; I've read the changes and I think they meet my needs.  (I
will try to explain how/why I want to use larger-than-integer
mem.limits() below.  If there's a better or more supported way to
achieve what I want, that'd be fine too)

> But I do wonder why you are using --max-vsize: the documentation says
> it is very rarely needed, and I suspect that there are better ways to
> do this.

Here's the basic idea: I would like to be able to restrict R to a large
amount of memory (say 4GB, for the sake of argument), but in a way such
that I can increase that limit temporarily if it turns out to be
necessary for some reason.

The desire for a restriction is that I have found it fairly difficult to
predict in advance how much memory a given calculation or analysis is
going to take.  Part of that is my inexperience with R, leading to
hilarious thinkos, but I think that part of that difficulty to predict
is going to remain even as I gain experience.  I use R both on
multi-user systems and on single-user-multiple-use systems, and in both
cases it is usually bad if my R session causes the machine to swap;
usually that swapping is not the result of a desired computation -- most
often, it's from a straightforward mistake -- but it can take
substantial amounts of time for the machine to respond to aborts or kill
requests, and usually if the process grows enough to touch swap it will
continue growing beyond the swap limit too.

So, why not simply slap on an address-space ulimit instead (that being
the kind of ulimit in Linux that actually works...)?  Well, one reason
is that it then becomes necessary to estimate at the start of an R
session how much memory will be needed over the lifetime of that
session; guess too low, and at some point later (maybe days or even
weeks later) I might get a failure to allocate.  My options at that
stage would be to save the workspace and restart the session with a
higher limit, or attempt to delete enough things from the existing
workspace to allow the allocation to succeed.  (Have I missed anything?)
Saving and restarting will take substantial time (from writing ~4GB to
disk) while deleting things from the existing session involves cognitive
overhead that is irrelevant to my current investigation and may in any
case not succeed to free enough.

So, being able to raise the limit to something generally large for a
short time to perform a computation, get the results, and then lower the
limit again allows me to protect myself in general from overwhelming the
machine with mistaken computations, while also allowing in specific
cases the ability to dedicate more resources to a particular
computation.

> I don't find reporting values of several GB as bytes very useful, but
> then mem.limits() is not useful to me either ....

Ah, I'm not particularly interested in the reporting side of
mem.limits() :-); the setting side, on the other hand, very much so.

Thank you again for the fixes.

Best,

Christophe



More information about the R-devel mailing list