[R] which operating system + computer specifications lead to the best performance for R?

Mike Marchywka marchywka at hotmail.com
Sun Jan 23 14:17:32 CET 2011


> Date: Sat, 22 Jan 2011 19:49:43 -0800
> From: santosh.srinivas at gmail.com
> To: r-help at r-project.org
> Subject: Re: [R] which operating system + computer specifications lead to the best performance for R?
>
> Hi Josh,
>
> I was referring to the below point that I read a while back when I
> installed my first R (didn't mean to imply that 64 bit was not
> needed). Some packages also had issues on 64bit (I think I ran into
> some with RQuantLib). Maybe this could be worked around if there is
> enough time. The issues were on Windoze though, not sure about how
> things turn out on Linux (yet to try).
>
> 2.28 Should I run 32-bit or 64-bit R?
>
> For most users (especially beginners) we would recommend using the 32-
> bit build.
>
> The advantage of a native 64-bit application is that it gets a 64-bit
> address space and hence can address far more than 4GB (how much
> depends on the version of Windows, but in principle 8TB). This allows
> a single process to take advantage of more than 4GB of RAM (if
> available) and for R's memory manager to more easily handle large
> objects (in particular those of 1GB or more). The disadvantages are
> that all the pointers are 8 rather than 4 bytes and so small objects
> are larger and more data has to be moved around, and that less
> external software is available for 64-bit versions of the OS.


The decision will depend on exactly
what you are doing and how you do it. It is not hard to get to the point
where you do less with more. For example,

http://spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers

I had a recent case where a bash script
I had on our multi-core 'dohs server with cygwin  ran about as fast or faster on a 
emachines I got second hand with less than 1Gb of memory running Debian. 
This is not going to be typical, but if you care about performance often
you will be more concerned with "how you use R" rather than the machine
in isolation. You'll find you can handle bigger problems with less memory
if you make your data structures and algorithms work together so
that you access memory in predictable ways. VM can be quite tolerable
as long as you don't start thrashing but alternatively if you have 
streaming data sources and sinks and can make block oriented algorithms,
you don't need to buffer a bunch of junk only to have it stepping
on the other junk. This is as much a statement of r developers
as you. My favorite example from personal experience, not using R,
is another case where I was using a laptop to do a bunch of string
manipulations. With large data sets it turned out to be faster if I 
sorted the large data set before passing it to the program that did
all the work( which "SHOULD BE" CPU limited). You don't expect a sort to be fast, and since the following
program did not know the data was sorted it couldn't be expected
to benefit from this. However, the more regular memory accesses in the
latter program avoided VM problems and the speed up went from unusable
to no big problem ( disk access is 1e6 times slower or worse than any RAM
and even a fast disk makes this .5e6 which is still large even when you get lots
of data at once or may have some buffered in OS dependent way etc. ).

And finally, if you really need speed and can't limit everything to optimized
library calls, you may need to write your own compiled code. 




>
> The toolchain (compilers, linkers, ...) used to build 64-bit R is less
> mature than that for 32-bit R, but testing so far (and all the CRAN
> packages provide an extensive test suite) suggests that they are
> mature enough for production use. The compilers are able to take
> advantage of extra features of all x86-64 chips (more registers,
> SSE2/3 instructions, ...) and so the code may run faster despite using
> larger pointers.
>
> For advanced users the choice may be dictated by whether the
> contributed packages needed are available in 64-bit builds (and if
> they are not that is some indication that installing them from sources
> is problematic). At the time of writing the most commonly-used CRAN
> packages without 64-bit versions were BRugs and rggobi. The
> considerations can be more complex: for example 32/64-bit RODBC need
> 32/64 ODBC drivers respectively, and where both exist they may not be
> able to be installed together. An extreme example is the Microsoft
> Access/Excel ODBC drivers: if you have installed 64-bit Microsoft
> Office you can only install the 64-bit drivers and so need to use 64-
> bit RODBC and hence R.
>
>
> 2.29 Can both 32- and 64-bit R be installed on the same machine?
>
> Obviously, only relevant if the machine is running a 64-bit version of
> Windows – simply select both when using the installer. You can also go
> back and add 64-bit components to a 32-bit install.
>
> For many Registry items, 32- and 64-bit programs have different views
> of the Registry, but clashes can occur. The most obvious problem is
> the file association, which will use the last installation for which
> this option is selected, and if that was for an installation of both,
> will use 32-bit R.
>
>
>
> On Jan 23, 7:56 am, Joshua Wiley  wrote:
> > On Sat, Jan 22, 2011 at 6:37 PM, Santosh Srinivas
> >
> >  wrote:
> > > Hi Marc,
> >
> > > I've exactly the same question and it looks like most of the heavy users
> > > from the threads I've followed use Unix/Linux/Mac.
> > > Some threads have given rationale for a 64bit system due to memory benefits
> > > but there seems to be not much buy-in from the guys here (so I'd give that a
> > > pass). The CRAN page also isn't very excited about 64bit for now.
> >
> > Really?  Perhaps I do not understand what you meant, but doesn't most
> > HPC work take > (2^32) bytes of memory?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > As David mentioned, Dirk's work seems to be hungry from speed and I closely
> > > (try to) follow his work.
> > > >From his blog, he uses a  "Debian Linux system" and that is what I've set up
> > > for myself. This obviously may just be a matter of coincidence.
> > > (But, saves me a lot of time trying to figure out issues related to the
> > > other OS's. Also, many authors of the packages that I use really don't have
> > > the time or inclination to make is Windoze friendly.)
> >
> > > My 2p in transition.
> >
> > > -----Original Message-----
> > > From: r-help-boun... at r-project.org [mailto:r-help-boun... at r-project.org] On
> > > Behalf Of David Winsemius
> > > Sent: 22 January 2011 21:02
> > > To: Marc Jekel
> > > Cc: r-h... at r-project.org Help
> > > Subject: Re: [R] which operating system + computer specifications lead to
> > > the best performance for R?
> >
> > > On Jan 22, 2011, at 10:03 AM, Sascha Vieweg wrote:
> >
> > >> On 11-01-22 14:56, Marc Jekel wrote:
> >
> > >>> I have the opportunity to buy a new computer for my simulations in
> > >>> R. My goal is to get the execution of R code as fast as possible. I
> > >>> know that the number of cores and the working memory capacity are
> > >>> crucial for computer performance but maybe someone has experience/
> > >>> knowledge which comp specifications are especially crucial
> > >>> (especially in relation to R). Is there any knowledge on the
> > >>> performance of R for different operating systems (Linux, Win, Mac
> > >>> etc.) resp. is performance dependent on the operating system at
> > >>> all? Even small differences in performance (i.e., speed of
> > >>> calculations) matter for me (quite large datasets + repeated
> > >>> calculations etc.).
> >
> > >> Not really a recommendation, just my considerations: That depends on
> > >> your budget, Mac Pro (5k$ in the U.S.) would probably serve your
> > >> needs for a long time ;-). I am running R 2.12.0 on a MacBook Pro,
> > >> 2.4 Dual Core with (only) 2G ram, together with (paid) TextMate as
> > >> editor, and Sweave. 2G ram is few! And I noted remarkable
> > >> improvements whan I was lucky to use a MBP Intel Core i5 for a
> > >> couple of days. Whatever processor and memory, I like the easy
> > >> interplay between R and the Unix environment (things like passing
> > >> shell commands from R to my system or other interpreters), easy
> > >> graphics etc.
> >
> > > I also use a MacPro (circa early 1998) R 2.12.1 with 24 GB and still
> > > find it generally very capable for a dataset of 5.5 MM rows and about
> > > 150 variables using the survival and rms packages. I seem to remember
> > > a price of 4KUS$ but I didn't write that check. I haven't succeeded in
> > > getting the multi-processor applications to work, however, and my
> > > guess is that Linux boxes (and Linux users) may be more likely to
> > > offer paths to success if that is an expectation. I am mostly
> > > interested in having adequate memory space for one core anyway, as
> > > most of the packages I use don't seem to be set up for parallel
> > > execution.
> >
> > > It may depend on what development system you use and which packages
> > > you expect to install. I know there are people with the StatET-
> > > equipped systems out there but I have never been able to get a working
> > > setup on my Mac. Too many moving parts and the gears don't seem to
> > > mesh out of the box. Same with GTK2+ and its R friends.
> >
> > > This would be better posted on the HPC mailing list anyway:
> > >https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> >
> > > You might want to search with "Dirk Eddelbuettel" in your search
> > > string, since he seems to share your "need for speed" and has
> > > championed various approaches to High Performance Computing with R:
> > >http://dirk.eddelbuettel.com/bio/presentations.html
> >
> > > --
> >
> > > David Winsemius, MD
> > > West Hartford, CT
> >
> > --
> > Joshua Wiley
> > Ph.D. Student, Health Psychology
> > University of California, Los Angeleshttp://www.joshuawiley.com/
> >
> > ______________________________________________
> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list