[R] flops calculation

jim holtman jholtman at gmail.com
Mon Oct 29 19:19:19 CET 2007


But typically what you are looking for is the "number of operations"
per unit time.  In the case of R, what you would probably be doing is
to monitor the time that it takes to go through a number of scenarios
and then divide this number by the CPU time and you will get "number
of operations per CPU seconds".  In most cases, even if you are
running on a system that has multiple jobs being run, where your
elapsed time may vary greatly  depending on what else is running, the
amount of CPU time that you need to perform some sequence of
operations (if the data is the same) will remain fairly constant.

So what I tend to look at is the amount of CPU time that it takes to
process a set of data of some given size.  I am usually doing data
analysis on computer systems where we are processing several million
transactions a day.  The data is a record of each transaction with the
amount of response time.  R is great for working with this data and
breaking it down by transactions and time of day to see what the
variations in performance are.  This is also correlated with the
performance metrics from the operating system (CPU time, I/O, memory
usage, network bandwidth, etc.).  In these cases the relationship
between the CPU time used and the number of transactions processed
remains relatively constant.  The times per unit of work will increase
with the size of the data since there is sorting/merging going on and
these are not linear with the size of the data.

On 10/29/07, Thomas Lumley <tlumley at u.washington.edu> wrote:
> On Sun, 28 Oct 2007, kevinchang wrote:
>
> >
> > Hi all,
> >
> > Since proc.time return three different kind of times (user, system and
> > elapsed) , I am wondering which one is right for calculating flops.
>
> Probably none of them.  The 'user'+'system' time is the amount of CPU time
> that can be blamed on R.  For most applications this time is mostly not
> used in floating point calculations, so it doesn't really seem meaningful
> to use it in a denominator and call the result 'flops'.
>
> Sometimes even the 'user'+'system' time is an underestimate of R's
> resource use -- for example, indexing a large file in RSQLite takes a long
> elapsed time with relatively little user or system time, apparently
> because the system is waiting for disk seeks [the disk is being treated as
> a random-access device].
>
>        -thomas
>
>
>
> Thomas Lumley                   Assoc. Professor, Biostatistics
> tlumley at u.washington.edu        University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list