[R] For Loop

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sun Sep 23 21:31:50 CEST 2018


On Sun, 23 Sep 2018, Wensui Liu wrote:

> what you measures is the "elapsed" time in the default setting. you
> might need to take a closer look at the beautiful benchmark() function
> and see what time I am talking about.

When I am waiting for the answer, elapsed time is what matters to me. 
Also, since each person usually has different hardware, running benchmark 
with multiple expressions as Ista did lets you pay attention to relative 
comparisons.

Keep in mind that parallel processing requires extra time just to 
distribute the calculations to the workers, so it doesn't pay to 
distribute tiny tasks like calculating the division of two numeric vector 
elements. That is the essence of vectorizing... bundle your simple 
calculations together so the processor can focus on getting answers rather 
than managing processes or even interpreting R for loops.

> I just provided tentative solution for the person asking for it  and
> believe he has enough wisdom to decide what's best. why bother to
> judge others subjectively?

I would say that Ista has backed up his objections with measurable 
performance metrics, so while his initial reaction was pretty subjective I 
think your reaction at this point is really off the mark.

One confusing aspect of your response is that Ista reacted to your 
use of the Vectorize function, but you responded as though he reacted 
to your use of the pvec function. I mentioned drawbacks of using pvec 
above, but it really is important to stress that the Vectorize function is 
a usability facade and is in no way a performance enhancement to be 
associated with what we refer to as vectorized (lowercase) code.

The Vectorize function creates a function that calls lapply, which in turn 
calls the C function do_lapply, which calls your R function with scalar 
inputs as many times as desired, storing the results in a list, which 
Vectorize then gives to mapply which runs another for loop over to create 
a matrix or vector result. This is clearly less efficient than a simple 
for loop would have been, rather than more efficient as a true vectorized 
solution such as log(c1[-1]/c1[-len]) will normally be. Vectorize is 
syntactic sugar with a performance penalty.

Please pay attention to the comments offered by others on this list... 
being told your solution is inferior doesn't feel good but it is a very 
real opportunity for you to improve.

End comment.

> On Sun, Sep 23, 2018 at 1:18 PM Ista Zahn <istazahn using gmail.com> wrote:
>>
>> On Sun, Sep 23, 2018 at 1:46 PM Wensui Liu <liuwensui using gmail.com> wrote:
>>>
>>> actually, by the parallel pvec, the user time is a lot shorter. or did
>>> I somewhere miss your invaluable insight?
>>>
>>>> c1 <- 1:1000000
>>>> len <- length(c1)
>>>> rbenchmark::benchmark(log(c1[-1]/c1[-len]), replications = 100)
>>>                   test replications elapsed relative user.self sys.self
>>> 1 log(c1[-1]/c1[-len])          100   4.617        1     4.484    0.133
>>>   user.child sys.child
>>> 1          0         0
>>>> rbenchmark::benchmark(pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1] / c1[i])), replications = 100)
>>>                                                                test
>>> 1 pvec(1:(len - 1), mc.cores = 4, function(i) log(c1[i + 1]/c1[i]))
>>>   replications elapsed relative user.self sys.self user.child sys.child
>>> 1          100   9.079        1     2.571    4.138      9.736     8.046
>>
>> Your output is mangled in my email, but on my system your pvec
>> approach takes more than twice as long:
>>
>> c1 <- 1:1000000
>> len <- length(c1)
>> library(parallel)
>> library(rbenchmark)
>>
>> regular <- function() log(c1[-1]/c1[-len])
>> iterate.parallel <- function() {
>>   pvec(1:(len - 1), mc.cores = 4,
>>        function(i) log(c1[i + 1] / c1[i]))
>> }
>>
>> benchmark(regular(), iterate.parallel(),
>>           replications = 100,
>>           columns = c("test", "elapsed", "relative"))
>> ##                 test elapsed relative
>> ## 2 iterate.parallel()   7.517    2.482
>> ## 1          regular()   3.028    1.000
>>
>> Honestly, just use log(c1[-1]/c1[-len]). The code is simple and easy
>> to understand and it runs pretty fast. There is usually no reason to
>> make it more complicated.
>> --Ista
>>
>>> On Sun, Sep 23, 2018 at 12:33 PM Ista Zahn <istazahn using gmail.com> wrote:
>>>>
>>>> On Sun, Sep 23, 2018 at 10:09 AM Wensui Liu <liuwensui using gmail.com> wrote:
>>>>>
>>>>> Why?
>>>>
>>>> The operations required for this algorithm are vectorized, as are most
>>>> operations in R. There is no need to iterate through each element.
>>>> Using Vectorize to achieve the iteration is no better than using
>>>> *apply or a for-loop, and betrays the same basic lack of insight into
>>>> basic principles of programming in R.
>>>>
>>>> And/or, if you want a more practical reason:
>>>>
>>>>> c1 <- 1:1000000
>>>>> len <- 1000000
>>>>> system.time( s1 <- log(c1[-1]/c1[-len]))
>>>>    user  system elapsed
>>>>   0.031   0.004   0.035
>>>>> system.time(s2 <- Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))
>>>>    user  system elapsed
>>>>   1.258   0.022   1.282
>>>>
>>>> Best,
>>>> Ista
>>>>
>>>>>
>>>>> On Sun, Sep 23, 2018 at 7:54 AM Ista Zahn <istazahn using gmail.com> wrote:
>>>>>>
>>>>>> On Sat, Sep 22, 2018 at 9:06 PM Wensui Liu <liuwensui using gmail.com> wrote:
>>>>>>>
>>>>>>> or this one:
>>>>>>>
>>>>>>> (Vectorize(function(i) log(c1[i + 1] / c1[i])) (1:len))
>>>>>>
>>>>>> Oh dear god no.
>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 22, 2018 at 4:16 PM rsherry8 <rsherry8 using comcast.net> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> It is my impression that good R programmers make very little use of the
>>>>>>>> for statement. Please consider  the following
>>>>>>>> R statement:
>>>>>>>>          for( i in 1:(len-1) )  s[i] = log(c1[i+1]/c1[i], base = exp(1) )
>>>>>>>> One problem I have found with this statement is that s must exist before
>>>>>>>> the statement is run. Can it be written without using a for
>>>>>>>> loop? Would that be better?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bob
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k




More information about the R-help mailing list