[R] any other fast method for median calculation

Thomas Lumley tlumley at u.washington.edu
Tue Apr 14 17:18:13 CEST 2009


On Tue, 14 Apr 2009, S Ellison wrote:

> Sorting with an appropriate algorithm is nlog(n), so it's very hard to
> get the 'exact' median any faster.

There actually are linear-time algorithms for the median, but n has to be very large before they are worth using, and by then you have to start considering locality of reference and other issues.

> In any case, it looks like you are not constrained by the median
> algorithm, but by the number of calls. You might do a lot better with
> apply, though
>> apply(df,2,median)
>
> On my system 200k columns were processed in negligible time by apply
> and I'm still waiting for mapply.

I'd also note that this is the sort of problem where the profiler is useful: you can see on a smaller subset whether R is spending most of its time in median() or somewhere else.

I wouldn't be surprised if a while() loop was even faster than apply() in this setting, but probably not enough to care about.

       -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list