[R] Median of streaming data

Rolf Turner r.turner at auckland.ac.nz
Wed Sep 24 08:43:34 CEST 2014


On 24/09/14 17:31, Mohan Radhakrishnan wrote:
> Hi,
>
>           I have streaming data(1 TB) that can't fit in memory. Is there a
> way for me to find the median of these streaming integers assuming I can
> fit only a small part in memory ? This is about the statistical approach to
> find the median of a large number of values when I can inspect only a part
> of them due to memory constraints.

You cannot, I'm pretty sure, calculate the median recursively.  However 
there are "approximate" recursive median algorithms which provide an 
estimate of location that has the same asymptotic properties as the median.

See:

* U. Holst, Recursive estimators of location.  Commun. Statist. Theory 
Meth., vol. 16, 1987, pp. 2201--2226.

and

* Murray A. Cameron and T. Rolf Turner, Recursive location and scale 
estimators, Commun. Statist. Theory Meth., vol. 22, 1993,
pp. 2503--2515.

cheers,

Rolf Turner

-- 
Rolf Turner
Technical Editor ANZJS



More information about the R-help mailing list