[R] 64-bit R and cache memory

Martin Maechler maechler at stat.math.ethz.ch
Sat May 17 18:28:13 CEST 2008


>>>>> "GA" == Gad Abraham <gabraham at csse.unimelb.edu.au>
>>>>>     on Sat, 17 May 2008 21:12:41 +1000 writes:

    GA> Joram Posma wrote:
    >> Dear all,
    >> 
    >> I have a few questions regarding the 64 bit version of R and the cache 
    >> memory R uses.
    >> 
    >> -----------------------------------
    >> Computer & software info:
    >> 
    >> OS: kUbuntu Feasty Fawn 7.04 (64-bit)
    >> Processor: AMD Opteron 64-bit
    >> R: version 2.7.0 (64-bit)
    >> Cache memory: currently 16 GB (was 2 GB)
    >> Outcome of 'limit' command in shell: cputime unlimited, filesize 
    >> unlimited, datasize unlimited, stacksize 8192 kbytes, coredumpsize 0 
    >> kbytes, memoryuse unlimited, vmemoryuse unlimited, descriptors 1024, 
    >> memorylocked unlimited, maxproc unlimited
    >> -----------------------------------
    >> 
    >> a. We have recently upgraded the cache memory from 2 to 16 GB. However, 
    >> we have noticed that somehow R still swaps memory when datasets 
    >> exceeding 2 GB in size are used. An indication that R uses approx. 2 GB 
    >> of cache memory is that sometimes R also kills the session when datasets 
    >> > 2 GB are loaded. How/where can we see how much cache memory R uses 
    >> (since memory.size and memory.limit are only for windows, and to us 
    >> those might be what we need)? 

use object.size(.) to see what's really the size of your large
data object.
Otherwise,

> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155605  8.4     350000 18.7   350000 18.7
Vcells 155621  1.2    2006827 15.4  2156058 16.5

is typically useful, but we often also use 'top' (on Linux
that you have as well) to monitor the R process.  Gad already
mentioned this ..

    >> those might be what we need)? Could this be caused by the limit of the 
    >> stack size (we are not exactly sure what the stack size is either) ?
    >> b. And how can we increase the cache memory used by R to 14 or even 16 
    >> GB (which might be tricky when running other programs, but still)?
    >> 
    >> So in general: how can we get R to use the full memory capacity of the 
    >> computer?
    >> 

    GA> The term "cache memory" is something entirely different to what you're 
    GA> referring to --- you're talking about RAM.

yes indeed.

    GA> Anyway, under Linux R will take all the RAM it can get, and if you're 
    GA> running a 64-bit OS on a 64-bit CPU then it should definitely be able to 
    GA> use more than 2GB of RAM.

definitely, and it does; I've tried up to 20 or so GB on a very
similar platform as yours.

HOWEVER, single R 'objects' are limited in size potentially earlier:

Do read  help(Memory-limits)
which has all (?) pertinent info,
and then also contains

     There are also limits on individual objects.  On all versions of
     R, the maximum length (number of elements) of a vector is 2^31 - 1
     ~ 2*10^9, as lengths are stored as signed integers.  In addition,
     the storage space cannot exceed the address limit, and if you try
     to exceed that limit, the error message begins 'cannot allocate
     vector of length'. The number of characters in a character string
     is in theory only limited by the address space.

and if you now compute, e.g. a numeric vector needs 8 bytes per
entry (plus ~ 40 bytes, it seems, currently on 64-bit linux), 
the maximal  numeric vector would need
  > (2^31-1)*8 + 40
  [1] 17179869216
bytes which is 2^14 MBytes (using the 1 MB = 2^20 bytes
definition) which is around 16.4 GB.
The maximal integer/logical vector would be half the size in bytes,
i.e. ~ 8.2 GB.

The *practical* maximal size is often a bit smaller,
and note that you typically should have around 5 to 10 fold the
amount of RAM than your large object, because of copying.

    GA> To see the memory usage, use the utility "top" in the console/terminal.

    GA> One thing to note: a dataset of 2GB on disk may take much more than 2GB 
    GA> of RAM when loaded into R, due to the overhead of the metadata and the 
    GA> fact that pointers are 64-bit long as well.

exactly!
The 'sfsmisc' package (from CRAN), contains some Unix-only
utilities, of which  Sys.ps() can be handy: It gives similar
information as 'top' but from inside R.
Use Sys.ps(fields="ALL")  to see how many  cpu/memory
process-specific info you can get; the default, Sys.ps() as used
below, just uses a few fields, but notably one of memory footprint.
a memory-only version of Sys.ps() is
Sys.sizes()  {not used in axample below}, all from 'sfsmisc'.

Here is a small excerpt of an R session (using R-devel,
i.e. 2.8.0 unstable) on one of our 32 GB AMD Opteron 
64-bit systems (running Linux, Redhat Enterprise, but could be
Debian/Ubuntu as well): 

## empty, newly started R session; a couple of MBs .. :
> gc()
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 155605  8.4     350000 18.7   350000 18.7
Vcells 155621  1.2    2006827 15.4  2156058 16.5

> x <- rep(pi,2^28)
> gc() 
            used   (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells    155606    8.4     350000   18.7    350000   18.7
Vcells 268591076 2049.2  564122996 4304.0 537026523 4097.2

## Aha: now use 2 GB and have used 4 GB intermediately {also
##  visible from 'top'

> x <- rep(x,2) ## double the object size
> gc()
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells    155607    8.4     350000    18.7     350000    18.7
Vcells 537026532 4097.2 1409694707 10755.2 1342332915 10241.2

## Yes, 4 GB now with a max. of 10.2 GB used during object construction

> system.time(x <- x+1)
   user  system elapsed 
  3.335   2.603   5.939 

## Now this shows the footprint *before* garbage collection:
> sfsmisc::Sys.ps() 
       pid       pcpu       time        vsz       comm 
    "6451"      "3.4" "00:00:52"  "8440924"        "R" 

> gc()
            used   (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells    157099    8.4     350000    18.7     350000    18.7
Vcells 537027103 4097.2 1409694707 10755.2 1342332915 10241.2

## after GC, we are back to 4 GB :

> sfsmisc::Sys.ps()
       pid       pcpu       time        vsz       comm 
    "6451"      "3.4" "00:00:52"  "4246616"        "R" 
> 
-----------------
And 'top' (or other such tools) confirm, that the machine never
swapped. {I wouldn't notice easily, as I am sitting at home, and
the computer runs in a vault down there at ETH  ;-)}.

So you see, a 4 GB object was not a problem on this machine.

Martin Maechler, ETH Zurich

    GA> -- 
    GA> Gad Abraham
    GA> Dept. CSSE and NICTA
    GA> The University of Melbourne
    GA> Parkville 3010, Victoria, Australia
    GA> email: gabraham at csse.unimelb.edu.au
    GA> web: http://www.csse.unimelb.edu.au/~gabraham



More information about the R-help mailing list