[R] data filtering

HENRIKSON, JEFFREY JEFHEN at SAFECO.com
Wed Jun 2 21:19:30 CEST 2004


I would like to know if there is a way to do the following command in
one step, primarily for speed on large data (5 million elements), and
secondarily for readablity.

mean(delta[(intersect(which(x[['class']]==0),which(delta<1)))])


Do I really have to rely on an intersect operator?  Isn't that
O(nlg(n))?  Can't I just filter in one step?  As an R newbie, I would
have guessed I could write

mean(delta[which((x[['class']]==0) && (delta<1))])

But I guess no such luck since (delta<1), etc are vectors.  Are they
really implemented as vectors?  Ie, if I take 5M data points, does it
allocate 20MB of RAM to make a test that passes most of the elements?

The only thing I can think of is to use closures to write something like
a Lisp list "filter".  Not sure on the readabilty merits, especially if
there is a direct way to do it.  If Matlab had closures I know running
them in a loop would be a bear on runtime anyway.


Jeff Henrikson




More information about the R-help mailing list