[R] advice requested re: building "good" system (R, SQL db) for handling large datasets

Richard Pearson richard.pearson at postgrad.manchester.ac.uk
Wed Feb 6 13:25:43 CET 2008


Hi Thomas

I'm certainly no expert but thought I'd reply as I'm likely to be in a 
similar position soon.

With regards versions of R I think you should always have the latest 
release version. This will mean upgrading at least every 6 months, but 
this shouldn't be too much of a problem.

With OSs, you need to be aware that there is an upper limit to the 
amount of RAM than be handled (2-4 GB) with many. I think if you plan to 
use more than 4GB RAM, you should definitely consider 64-bit linux. I 
have no information or opinions as to which flavour of linux.

With databases, one issue that might be relevant is whether you want to 
store data in tables (e.g. one table to store one data.frame) that can 
subsequently be manipulated in the DB, or to store R objects as R 
objects (e.g. as BLOBs). My situation is likely to be the later case, 
and one of my concerns is that many DBs have an upper limit of 2GB on 
BLOBs, and I might potentially have objects that are larger than this.

Finally, you might get more response on database issues from R-sig-db 
than R-help.

Best wishes

Richard.


Thomas Pujol wrote:
> R-community,
> Sometime during the next 12-months, I plan on configuring a new computer system on which I will primarily run "R" and a SQL database (Microsoft SQL Server, MySQL, Oracle, etc).  My primary goal is to "optimize" the system for R, and for passing data to and from R and the database.
>
> I work with large datasets, and therefore I "think" one of my most important goals should be to maximize the amount of RAM that R can utilize effectively.
>
> I am seeking advice concerning the version of R, OS,  processor, hard-drive/storage configuration, database, etc. that I should consider. (I am guessing that I should build a system with lots of RAM, and a Linux OS, but am seeking advice from the R community.) If I choose Linux, does it matter which version I use? Any opinion regarding  implementing a commercially supported version from a vendor such as Red Hat, Sun, etc? Is any database particularly better at "exchanging" data with R?
>
> While cost is of course a consideration, it is probably a secondary consideration to overall performance, reliability, and ease of ongoing maintenance/support.
>
> Thanks!
>
>        
> ---------------------------------
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list