[R] Ideal (possible) configuration for an exalted R system

Matthew Keller mckellercran at gmail.com
Mon Mar 9 20:59:47 CET 2009


I also work with very large datasets, and currently am using an early
2008 MacPro 2x3GHz Quad-Cpre Intel Xeon with 32 GB RAM. This works
very well, although (I'm ashamed to say, since it's partly a
reflection of my programming skills) I still run out of RAM
occasionally! But this system works well generally and is not all that
costly compared to comparable workstations/servers.

That said, Intel has just released a new chipset (Xeon 5500 using the
Nehalem architecture). Mac Pros are currently the only computers that
have them, but by the end of March they will be in plenty of systems &
servers. I think that these new chipsets will offer serious speed
advantages to R users because it helps with memory bottlenecks by
using on-die memory controllers (which AMD has done for 5 years or
so). RAM intensive tasks should see a substantial speed up.

All that said, I'm not sure how these new Xeon 5500 chips will compare
in performance to the existing AMD chipsets.

Would be curious if anyone out there has ideas about optimal systems
for R. We're about to invest serious money in some servers here to do
statistical genetics work.

Best,

Matt



On Mon, Feb 16, 2009 at 2:44 PM, Kingsford Jones
<kingsfordjones at gmail.com> wrote:
> Hi Harsh,
>
> The useR! 2008 site has useful information.  E.g. talks by
>
> Graham Williams:
>
> http://www.statistik.uni-dortmund.de/useR-2008/slides/Williams.pdf
>
> Dirk Eddelbuettel
>
> http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf
>
> and others
>
> http://www.statistik.uni-dortmund.de/useR-2008/abstracts/AbstractsByTopic.html#High%20Performance%20Computing
>
>
>
> A few days ago I was googling to see what types of workstations are
> available these days.  Here's some with up to 64gb ram:
>
> http://www.colfax-intl.com/jlrid/SpotLight.asp?IT=0&RID=80
>
> Perhaps it won't be long before we see such memory in laptops:
>
> http://www.ubergizmo.com/15/archives/2009/01/samsung_opens_door_to_32gb_ram_stick.html
>
> Like you, I'd also be interested in hearing about configurations folks
> have used to work w/ large datasets.
>
>
> hth,
>
> Kingsford Jones
>
>
>
>
>
>
>
> On Mon, Feb 16, 2009 at 5:10 AM, Harsh <singhalblr at gmail.com> wrote:
>> Hi All,
>> I am trying to assemble a system that will allow me to work with large
>> datasets (45-50 million rows, 300-400 columns) possibly amounting to
>> 10GB + in size.
>>
>> I am aware that R 64 bit implementations on Linux boxes are suitable
>> for such an exercise but I am looking for configurations that R users
>> out there may have used in creating a high-end R system.
>> Due to a lot of apprehensions that SAS users have about R's data
>> limitations, I want to demonstrate R's usability even with very large
>> datasets as mentioned above.
>> I would be glad to hear from users(share configurations and system
>> specific information) who have desktops/servers on which they use R to
>> crunch massive datasets.
>>
>>
>> Any suggestions in expanding R's functionality in the face of gigabyte
>> class datasets would be appreciated.
>>
>> Thanks
>> Harsh Singhal
>> Decision Systems,
>> Mu Sigma Inc.
>> Chicago, IL
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com




More information about the R-help mailing list