[R] Vectorised operations
John Logsdon
j.logsdon at quantex-research.com
Wed May 18 15:32:49 CEST 2016
Folks
I have some very long vectors - typically 1 million long - which are
indexed by another vector, same length, with values from 1 to a few
thousand, sp each sub part of the vector may be a few hundred values long.
I want to calculate the cumulative maximum of each sub part the main
vector by the index in an efficient manner. This can obviously be done in
a loop but the whole calculation is embedded within many other
calculations which would make everything very slow indeed. All the other
sums are vectorised already.
For example,
A=c(1,2,1, -3,5,6,7,4, 6,3,7,6,9, ...)
i=c(1,1,1, 2,2,2,2,2, 3,3,3,3,3, ...)
where A has three levels that are not the same but the levels themselves
are all monotonic non-decreasing.
the answer to be a vector of the same length:
R=c(1,2,2, -3,5,6,7,7, 6,6,7,7,9, ...)
If I could reset the cumulative maximum to -1e6 (eg) at each change of
index, a simple cummax would do but I can't see how to do this.
The best way I have found so far is to use the aggregate command:
as.vector(unlist(aggregate(a,list(i),cummax)[[2]]))
but rarely this fails, returning a shorter vector than expected and seems
rather ugly, converting to and from lists which may well be an
unnecessary overhead.
I have been trying other approaches using apply() methods but either it
can't be done using them or I can't get my head round them!
Any ideas?
Best wishes
John
John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675
