[R] Median of streaming data

Rolf Turner r.turner at auckland.ac.nz
Thu Sep 25 01:44:38 CEST 2014


On 24/09/14 20:16, Martin Maechler wrote:

<SNIP>

> 1) has your proposal ever been provided in R?
>     I'd be happy to add it to the robustX
>     (http://cran.ch.r-project.org/web/packages/robustX) or even
>     robustbase (http://cran.ch.r-project.org/web/packages/robustbase) package.

<SNIP>

I have coded up the algorithm from the Cameron and Turner paper.  Dunno 
if it gives exactly the same results as my (Splus?) code from lo these 
many years ago (the code that is lost in the mists of time), but it 
*seems* to work.

It is not designed to work with actual "streaming" data --- I don't know 
how to do that.  It takes a complete data vector as input.  Someone who 
knows about streaming data should be able to adapt it pretty easily. 
Said he, the proverbial optimist.

The function code and a help file are attached.  These files have had 
their names changed to end in ".txt" so that they will get through the 
mailing list processor without being stripped.  With a bit of luck.

If they *don't* get through, anyone who is interested should contact me 
and I will send them to you "privately".

cheers,

Rolf

-- 
Rolf Turner
Technical Editor ANZJS
-------------- next part --------------
rlas <- function(y,b=0.2,mfn=function(n){0.1*n^(-0.25)},
                 nstart=30,scon=NULL) {
# Initialize:
y0    <- y[1:nstart]
alpha <- median(y0)
s     <- if(is.null(scon)) mean(abs(y0-alpha)) else scon
mu    <- mfn(nstart)/s
eps   <- s/nstart^b
kount <- sum(abs(alpha-y0) < eps)
g     <- kount/(eps*nstart)
ny    <- length(y)
n     <- nstart+1
locn  <- numeric(ny)
locn[1:(nstart-1)] <- NA
locn[nstart] <- alpha
scale  <- numeric(ny)
scale[1:(nstart-1)] <- NA
scale[nstart] <- s

# Calculate recursively:
while(n <= ny) {
   s     <- if(is.null(scon)) ((n-1)*s + abs(y[n]-alpha))/n else scon
   mu    <- mfn(n)/s
   eye   <- if(abs(alpha - y[n]) < s/n^b) 1 else 0
   g     <- (1-1/n)*g + n^(b-1)*eye/s
   a     <- max(mu,g)
   alpha <- alpha + sign(y[n]-alpha)/(a*n)
   locn[n]  <- alpha
   scale[n] <- s
   n <- n+1
}
list(locn=locn,scale=scale)
}
-------------- next part --------------
\name{rlas}
\alias{rlas}
\title{
  Recursive location and scale.
}
\description{
  Calculate a recursive estimate of location, asymptotically
  equivalent to the median, and a recursive estimate of scale
  equal to the \bold{MEAN} absolute deviation.
}
\usage{
rlas(y, b = 0.2, mfn = function(n){0.1 * n^(-0.25)},
     nstart = 30, scon = NULL)
}
\arguments{
  \item{y}{
  A numeric vector of i.i.d. data whose location and scale
  parameters are to be estimated.
}
  \item{b}{
  A tuning parameter (default value equal to that used by
  Holst, 1987).
}
  \item{mfn}{
  Function of the index of the data which must be positive and
  and tend to 0 as the index tends to infinity.  The default
  function is that used by Holst, 1987.
}
  \item{nstart}{
  Starting values for the algorithm are formed from the first
  \code{nstart} values of \code{y}.
}
  \item{scon}{
  A constant value for the scale parameter \code{s}. If this
  is provided then the algorithm becomes equivalent to the
  algorithm of Holst, 1987.
}
}
\value{
A list with entries
  \item{locn}{The successive recursive estimates of location.  The
  first \code{nstart - 1} of these are \code{NA}.}
  \item{scale}{The successive recursive estimates of scale. If
  \code{scon} is specified these are all equal to \code{scon}.}
}
\references{
Cameron, Murray A. and Turner, T. Rolf (1993). Recursive location and
scale estimators. \emph{Commun. Statist. --- Theory Meth.} \bold{22}
(9) 2503--2515.

Holst, U. (1987). Recursive estimators of location.
\emph{Commun. Statist. --- Theory Meth.} \bold{16} (8) 2201--2226.
}
\author{
 \email{r.turner at auckland.ac.nz}
 \url{http://www.stat.auckland.ac.nz/~rolf}
}
\examples{
set.seed(42)
y  <- rnorm(10000)
z1 <- rlas(y)
z2 <- rlas(y,scon=1)
z3 <- rlas(y,scon=100)
OP <- par(mfrow=c(3,1))
plot(z1$locn,type="l")
abline(h=median(y),col="red")
plot(z2$locn,type="l")
abline(h=median(y),col="red")
plot(z3$locn,type="l")
abline(h=median(y),col="red")
par(OP)
}
}
\keyword{ univar }
\keyword{ robust }


More information about the R-help mailing list