[R] Best Hardware & OS For Large Data Sets

Allan Engelhardt allane at cybaea.com
Sun Feb 28 09:59:50 CET 2010


On 27/02/10 17:47, J. Daniel wrote:
> Greetings,
>
> I am acquiring a new computer in order to conduct data analysis.  I
> currently have a 32-bit Vista OS with 3G of RAM and I consistently run into
> memory allocation problems.  I will likely be required to run Windows 7 on
> the new system, but have flexibility as far as hardware goes.  Can people
> recommend the best hardware to minimize memory allocation problems?  I am
> leaning towards dual core on a 64-bit system with 8G of RAM.  Given the
> Windows constraint, is there anything I am missing here?
>
> I know that Windows limits the RAM that a single application can access.
> Does this fact over-ride many hardware considerations?  Any way around this?
>    
You are right on the RAM limit: the way around it is to move to 64-bit 
operating system.There is an experimental build of core R for 64-bit 
windows [1] and there is at least one commercial version available [2].  
   (You can run the 32 bit version of R on 64-bit Windows, but it will 
only use up to 3.5G of memory [3].)  How much memory you should have 
really depends on your data sets and what you do.  I have 16G on my 
4-core workstation and frequently use it up, but I do marketing analysis 
on tens of millions of telco customers.  I overflow to AWS which has 
instances with 7.5G, 15G, 17G, 34G, and 68G memory [4] which you may 
consider as guides for your system(s).

I would reconsider the operating system constraint.  A Unix-like 64-bit 
operating system  (I'm a Fedora guy but anything should work well) may 
be a better long term solution and is likely to give you more easy 
access to cloud computing (e.g. AWS or your own cluster) when your 
processing requirements grow.  Also 64 bit seems to be better supported 
in that environment.

In all instances you are still going to be constrained by R limiting a 
vector to 2^31-1 elements and, worse, representing a matrix as a vector 
which means the product of the dimensions is limited to 2^31-1.  What 
you gain is the ability to have many more <2^31-1 vectors available.

Hope this helps a little

Allan

[1] http://cran.r-project.org/bin/windows64/contrib/
[2] http://www.revolution-computing.com/
[3] See FAQ 2.9 at http://cran.r-project.org/bin/windows/base/rw-FAQ.html
[4] http://aws.amazon.com/ec2/instance-types/


> Thanks,
>
> JD
>
>
>



More information about the R-help mailing list