[R] Speeding up a loop
ruipbarradas at sapo.pt
Sat Jul 21 20:53:33 CEST 2012
Ok, sorry, I should have included some comments.
The function is divided in three parts, 1. intro, 2. decision, 3. keep rows.
Part 3 is the function keep(), internal to to.keep(). Let's start with 1.
1. Setup some variables first.
1.a) The variables 'a'.
If the input object 'x' is a matrix this doesn't give a great speed-up
but if 'x' is a data.frame, extraction is time consuming.
So, do this once only, at the beginning.
1.b) The new environment.
This is because my first version would need to change values declared
outside the internal function.
This can be done with the global assignment operator, <<-, but this
pratice should be avoided, it's easy to mess things up.
Note that all the variables changed inside the internal function are in
this new environment, 'e'.
In particular note that 'result' is initialized with 1000 rows.
2. The loop.
This is where we decide if we want to keep that row. I have negated the
condition from an original 'no'.
The 'no' condition:
a1[i] < a1 & a2[i] < a2 & a3[i] > a3 & a4[i] < a4
Then the test would be:
if(any(no)) dont_keep else keep. # pseudo-code
Not in pseudo-code:
if( all( !no ) ) keep(i, e)
The down side of this is that the original is more readable.
3. The internal function, keep().
Considering the small number of rows I have used for tests, e$result was
initialized to 1e3.
With 5e5 lines I would increase this number to 1e5.
First, the funcion updates the [row number] pointer into 'result' and
checks if we are at a 'result' limit.
If yes, make it bigger by e$increment [ == 1e3 ] rows.
Then just assign row i from matrix/df 'x' to the appropriate row of
The reason why we need the environment is because on function return,
all but the returned value is lost.
We could return a list with saved values of ires, curr.rows, result, and
return the list.
But this would complicate and slow things down. Assign, update and
Environments can help keep it "simple", in the sense of to keep together
what is meant to be used together.
And now I hope there is not an overdose of comments :)
Em 21-07-2012 18:37, wwreith escreveu:
> Any chance I could ask for an idiots guide for function to.keep(x). I
> understand how to use it but not what some of the lines are doing. Comments
> would be extremely helpful.
> View this message in context: http://r.789695.n4.nabble.com/Speeding-up-a-loop-tp4637201p4637316.html
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help