[Rd] Reducing RAM usage using UKSM

Wed Jul 16 16:47:31 CEST 2014

On Jul 16, 2014, at 9:51 AM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:

> On 16/07/2014 14:07, Gregory R. Warnes wrote:
>> Hi Varadharajan,
>> 
>> Linux uses a copy-on-write for the memory image of forked processes.
> 
> But Linux copied it from Unix and I see no mention of Linux in the posting being replied to.

The reference was indirect.  The UKSM documentation indicates that it is implemented as a patch for the Linux kernel. 

>> Thus, you may also get significant memory savings by launching a single R process, loading your large data object, and then using fork::fork() to split off the other worker process.
> 
> Or using the more refined interface in package 'parallel' (which is portable, unlike package 'fork': see its CRAN check results).

Thank you for pointing out the issues with fork, I’ll take a look at what is going on with Solaris.

-Greg

> 
>> -Greg
>> 
>> Sent from my iPad
>> 
>>> On Jul 16, 2014, at 5:07 AM, Varadharajan Mukundan <srinathsmn at gmail.com> wrote:
>>> 
>>> [Sending it again in plain text mode]
>>> 
>>> Greetings,
>>> 
>>> We've a fairly large dataset (around 60GB) to be loaded and crunched
>>> in real time. The kind of data operations that will be performed on
>>> this data are simple read only aggregates after filtering the
>>> data.table instance based on the parameters that will passed in real
>>> time. We need to have more than one instance of such R process running
>>> to serve different testing environments (each testing environment has
>>> fairly identical dataset but do have a *small amount of changes*). As
>>> we all know, data.table loads the entire dataset into memory for
>>> processing and hence we are facing a constraint on number of such
>>> process that we could run on the machine. On a 128GB RAM machine, we
>>> are coming up with ways in which we could reduce the memory footprint
>>> so that we can try to spawn more instances and use the resources
>>> efficiently. One of the approaches we tried out was memory
>>> de-duplication using UKSM
>>> (http://kerneldedup.org/en/projects/uksm/introduction), given that we
>>> did have few idle cpu cores. Outcome of the experiment was quite
>>> impressive, considering that the effort to set it up was quite less
>>> and the entire approach considers the application layer as a black
>>> box.
>>> 
>>> Quick snapshot of the results:
>>> 1 Instance (without UKSM): ~60GB RAM was being used
>>> 1 Instance (with UKSM): ~53 GB RAM was being used
>>> 
>>> 2 Instance (without UKSM): ~125GB RAM was being used
>>> 2 Instance (with UKSM): ~81 GB RAM was being used
>>> 
>>> We can see that around 44 GB of RAM was saved after UKSM merged
>>> similar pages and all this for  a compromise of 1 CPU core on a 48
>>> core machine. We did not feel any noticeable degradation of
>>> performance because the data is refreshed by a batch job only once
>>> (every morning); UKSM gets in at this time and performs the same page
>>> merging and for the rest of day, its just read only analysis. The kind
>>> of queries we fire on the dataset at most scans 2-3GB of the entire
>>> dataset and hence the query subset spike was low as well.
>>> 
>>> We're interested in knowing if this is a plausible solution to this
>>> problem? Any other points/solutions that we should be considering?
> 
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595