[R] What is the most cost effective hardware for R?

Hugh Morgan h.morgan at har.mrc.ac.uk
Tue May 8 18:37:02 CEST 2012


Perhaps I have confused the issue.  When I initally said "data points" I 
meant one stand alone analysis, not one piece of data.  Each analysis 
point takes 1.5 seconds.  I have not implemented running this over the 
whole dataset yet, but I would expect it to take about 5 to 10 hours.  
This is just about acceptable, but it would be better if this was 
quicker.  As I say, the exact analysis method has not yet been 
determined, and if that was significantly more computationally intensive 
then that could be an issue.

It is not actually a simulation, it is a pre-analysis of the dataset 
before public display.  I do have a simulation of the analysis to run, 
and that could be some orders of magnitude larger than the real 
dataset.  I can of course wait for that.

Thanks for the input.

On 05/08/2012 05:24 PM, Bert Gunter wrote:
> Probably just pointing out the obvious, but:
>
> 200,000 data points may not be that many these days, depending on the
> dimensionality of the data. Nor is 10 times that number, neither now
> nor in 5 years, again depending on data dimensionality. So my question
> is, have you actually tried running your simulations -- or a
> reasonable approximation thereof -- on a single "cheap" machine? It
> might be that your concerns are overblown, especially with multicore
> and parallelization.
>
> Obviously, ignore if you've already done this and know it's nonsense.
>
> Cheers,
> Bert
>
> On Tue, May 8, 2012 at 8:50 AM, Hugh Morgan<h.morgan at har.mrc.ac.uk>  wrote:
>> On 05/08/2012 12:14 PM, Zhou Fang wrote:
>>> How many data points do you have?
>>>
>> Currently 200,000.  We are likely to have 10 times that in 5 years.
>>
>>>   Why buy when you can rent? Unless your hardware is going to be
>>> running 24/7 doing these analyses then you are paying for it to sit
>>> idle. You might be better off purchasing computing time from Amazon or
>>> another cloud computing provider. If you need to run more analyses
>>> quickly, just buy some more virtual hosts.
>>
>> Because of the nature of the funding we are likely to be better off buying.
>>   We are likely to be running most of the time, most of the analysis must be
>> rerun as more data becomes available, and that is likely to happen a few
>> times every week.
>>
>> Thank you for all the pointers, we shall consider them all.
>>
>>
>> This email may have a PROTECTIVE MARKING, for an explanation please see:
>> http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>


This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm



More information about the R-help mailing list