[R] single-pass algorithm for quantile calculation

Prof Brian D Ripley ripley at stats.ox.ac.uk
Tue Apr 3 20:54:16 CEST 2001


On Tue, 3 Apr 2001, Vadim Ogranovich wrote:

> Dear R users, I am looking for a reference to an algorithm for estimation of
> sample quantiles which does not require bringing the whole data into memory
> (more precisely its memory complexity should be much less than linear,
> ideally constant). I realize that such an algorithm can only be approximate
> and actually quite wrong for some samples, but that's fine with me.

There is no approximately accurate algorithm (consider a sorted dataset)
that does not look at the whole file.

Try random subsampling (in whatever database holds your file, or by
bringing in chunks and sampling while in memory).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list