[R] Percentiles/Quantiles with Weighting

Stavros Macrakis macrakis at alum.mit.edu
Tue Feb 17 23:48:49 CET 2009


Here is one kind of weighted quantile function.

The basic idea is very simple:

wquantile <- function( v, w, p )
  {
    v <- v[order(v)]
    w <- w[order(v)]
    v [ which.max( cumsum(w) / sum(w) >= p ) ]
  }

With some more error-checking and general clean-up, it looks like this:

# Simple weighted quantile
#
# v  A numeric vector of observations
# w  A numeric vector of positive weights
# p  The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.

# Basic idea

wquantile <- function( v, w, p )
  {
    v <- v[order(v)]
    w <- w[order(v)]
    v [ which.max( cumsum(w) / sum(w) >= p ) ]
  }

# Simple weighted quantile
#
# v  A numeric vector of observations
# w  A numeric vector of positive weights
# p  The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.

wquantile <- function(v,w=rep(1,length(v)),p=.5)
   {
     if (!is.numeric(v) || !is.numeric(w) || length(v) != length(w))
       stop("Values and weights must be equal-length numeric vectors")
     if ( !is.numeric(p) || any( p<0 | p>1 ) )
       stop("Quantiles must be 0<=p<=1")
     ranking <- order(v)
     sumw <- cumsum(w[ranking])
     if ( is.na(w[1]) || w[1]<0 ) stop("Weights must be non-negative numbers")
     plist <- sumw/sumw[length(sumw)]
     sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ])
   }

I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....

              -s


On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney <bkmooney at gmail.com> wrote:
> Hi All,
>
> I am looking at applications of percentiles to time sequenced data.  I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> R-implemented) method to apply weighting to the data so as to weigh recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and not
> really very flexible in its applications - so if anyone has any suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note - this function supposes the data in X is time-sequenced, with the most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
>  Xprime <- NA
>
>  for(i in 1:length(X))
>  {
>    Xprime <- c(Xprime, rep(X[i], times=i))
>  }
>
>  print("Percentiles:")
>  print(quantile(X, pctile))
>  print("Weighted:")
>  print(Xprime)
>  print("Weighted Percentiles:")
>  print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list