[R] Loop: noob question

Bert Gunter gunter.berton at gene.com
Fri Aug 5 19:39:31 CEST 2011


Inline below.

On Fri, Aug 5, 2011 at 10:23 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
> Bert,
>
> You are absolutely correct: I was wrong not to vectorize in this case.
>

No. That wasn't my point at all. In this case, vectorizing doesn't
seem to help because you still must do a loop (via *ply) in R. My
point was only that replicate() is another form of a loop and that,
when possible, vectorizing is preferable. Here I don't think it's
possible. However, I interpreted your advice as a general
prescription, and that was the problematic part.

> I am surprised, however, by your remark that sapply() (or really lapply())
> is faster than apply() -- is there a reason for this?

I apologize for being unclear. I **do not know** this to be true,
although one might guess it to be so by looking at the apply() code,
which has some overhead in it before calling vapply() to do the actual
loop. However, I defer to experts for confirmation -- or not.

Cheers,
Bert

I would have guessed
> that the major difference between the two would have been memory management
> since replicate only holds 250 numbers at a time, rather than 2.5e6. I don't
> know how memory management is implemented on the C level for R so I might be
> entirely wrong about that though.
>
> For UnitRoot: if you go with Bert's code, there's a little typo:
>
> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>
> move a parenthesis:
>
> rmat <- matrix(rnorm(2.5e7), nrow=250)  ## each column is a sample of 250
>
> Michael Weylandt
>
> On Fri, Aug 5, 2011 at 12:51 PM, Bert Gunter <gunter.berton at gene.com> wrote:
>>
>> Michael:
>>
>> I'm sorry, but this "advice" is wrong. replicate() **IS** essentially
>> a loop: it uses sapply(), which is basically an interpreted loop (with
>> suitable caveats that R experts can provide).
>>
>> The correct advice is: whenever possible, move the loops down to
>> underlying C code by vectorizing. In the example below, one can partly
>> do this by removing the random number generation from the loop and
>> structuring the result as a matrix:
>>
>> rmat <- matrix(rnorm(2.5e7, nrow=250))  ## each column is a sample of 250
>>
>> Now one must use apply() -- which is a loop for -- quantile()
>>
>> out <- apply(rmat,2,quantile, probs = .95)
>>
>> Unfortunately, in this case, it won't make much difference, as random
>> number generation is very fast anyway and the looping for quantile()
>> is the same either way. In fact, both versions took almost the same
>> time on my computer (replicate() was actually 3 seconds faster --- 48
>> vs 51 seconds -- perhaps because sapply() is slightly more efficient
>> than apply() ).
>>
>> As here (I think), one often cannot vectorize and must loop in
>> interpreted code, and for this replicate() is fine and yields nice
>> clean code. But it's still a loop.
>>
>> Cheers,
>> Bert
>>
>> On Fri, Aug 5, 2011 at 9:15 AM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>> > This is a textbook of when NOT to use a loop in R: rather make a
>> > function
>> > that does what you want and use the replicate function to do it
>> > repeatedly.
>> >
>> > f <- function(){
>> > return(-1000*quantile(rnorm(250,0,0.2),0.95)
>> > }
>> >
>> > x = replicate(1e5,f())
>> >
>> > There are your desired numbers.
>> >
>> > Some general coding principles: firstly, you don't need to convert to
>> > time
>> > series: quantile immediately undoes that so you've just wasted the time
>> > doing the coercions each direction. Secondly, the quantiles of c*X are
>> > exactly c times the quantiles of X so you can cut down on the
>> > multiplications needed by doing the -1000 multiplication after
>> > quantilization.
>> >
>> > Specific to R: don't use loops unless entirely necessary. (And it's
>> > hardly
>> > ever necessary) -- replicate is great for repeated operations, apply is
>> > great for looping "over" thins.
>> >
>> > More broadly, what do you intend to do with the v95 values? There are
>> > probably much more efficient ways to get the desired numbers or even
>> > closed
>> > form results. I believe this idea is widely studied as VaR in finance.
>> >
>> > Feel free to write back if we can be of more help.
>> >
>> > Hope this helps,
>> >
>> > Michael Weylandt
>> >
>> > On Fri, Aug 5, 2011 at 11:35 AM, UnitRoot <akhussanov at gmail.com> wrote:
>> >
>> >> Hi,
>> >> Can someone help me out to create a (for?) loop for the following
>> >> procedure:
>> >>
>> >> x=rnorm(250,0,0.02)
>> >> library(timeSeries)
>> >> x=timeSeries(x)
>> >> P=1000
>> >> loss=-P*x
>> >> loss
>> >> v95=quantile(loss,0.95)
>> >> v95
>> >>
>> >> I would like to generate 100 000 v95 values.
>> >> Also, can anybody name a source where I could read up basic programming
>> >> in
>> >> R?
>> >> Thank you.
>> >>
>> >> --
>> >> View this message in context:
>> >> http://r.789695.n4.nabble.com/Loop-noob-question-tp3721500p3721500.html
>> >> Sent from the R help mailing list archive at Nabble.com.
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> "Men by nature long to get on to the ultimate truths, and will often
>> be impatient with elementary studies or fight shy of them. If it were
>> possible to reach the ultimate truths without the elementary studies
>> usually prefixed to them, these would not be preparatory studies but
>> superfluous diversions."
>>
>> -- Maimonides (1135-1204)
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>
>



More information about the R-help mailing list