[Rd] Adding weights to ecdf

Fri Jan 11 16:19:35 CET 2008

I would like you consider that the function ecdf
could be extended in the following way to handle weights
when computing Empirical distribution Functions. There
exist particular cases that supports this kind of
extension, see for example:

Rao, C. R., 1997.
Statistic and True. Putting chance to work.
World Scientific Publishing.

Cox, D. R., 1969.
Some Sampling Problems in Technology.
New Developments in Survey Sampling, pg. 506--527
John Wiley, New York.

Vardi, Y., 1982.
Nonparametric estimation in the presence of length bias
The Annals of Statistics, vol. 10, num. 2 , pg. 616--620.

Gill, Richard D. and Vardi, Yehuda and Wellner, Jon A., 1988.
Large sample theory of empirical distributions in biased sampling models.
The Annals of Statistics, vol. 16, num. 3, pg. 1069--1112.

and all the references there in.

In this case, and for observations x=(x1,x2, ... xn), with
weights weig=(w1, w2,..., wn), Fn(t) is the addition of the
normalized weights nwi=wi/sum(weig) for all i such that
xi observations less or equal to t i.e.,

Fn(t) = w1/(sum(i=1,n)wi) sum(i=1,n) Indicator(xi <= t).

The code I suggest to use is the following

newEcdf <- function (x,weig=rep(1,length(x)))
#	x = vector with observations.
#	weig = weigths, one per observation.
{
     n <- length(x)
     if (n < 1)
         stop("'x' have 1 or more non-missing values")
	if (n != length(weig))
         stop("'weig' must have same length than 'x'")
     xPerm <- order(x)
     weig <- weig/sum(weig)
     x <- x[xPerm]
     weig <- weig[xPerm]
	ndIdx <- !duplicated(x)
     vals <- x[ndIdx]
     weig <- sapply(x[ndIdx],function(xx) sum(weig[x==xx]))
     rval <- approxfun(vals, cumsum(weig),
         method = "constant", yleft = 0, yright = 1, f = 0, ties = 
"ordered")
     class(rval) <- c("ecdf", "stepfun", class(rval))
     attr(rval, "call") <- sys.call()
     rval
}

and as a particular example, to see how the E.C.D.F. behaves
in the length--biased framework consider running the following
function with different values for n

lengthBiasExample <- function(n=100)
{
	distFun <- function(x) pgamma(x,shape=1,rate=1)
	##lenght biased version of a gamma(shape=1,rate=1)
     x <- rgamma(n,shape=2,rate=1)
	par(ask=T)
	Fx <- ecdf(x)
	newFx <- newEcdf(x,1/x)
	plot(Fx,pch="+")
	curve(distFun,from=0,to=10,add=T,col="red")
	plot(newFx,pch="+")
	curve(distFun,from=0,to=10,add=T,col="red")
	print(summary(newFx))
	par(ask=F)
}

Thank you very much in advance, Jorge.

Jorge Luis Ojeda Cabrera
Prof. Asoc. del Dep. Métodos Estadísticos.

Univ. de Zaragoza.
Fac. de Ciencias, Edif. Matemáticas.
Pedro Cerbuna, num. 12.
50009 Zaragoza.
Spain.

Tlf.:    34 76 762884
Fax:    34 76 761115
jojeda at unizar.es