[R] maximum difference between two ECDF's

Bart Vandewoestyne Bart.Vandewoestyne at telenet.be
Thu Jun 28 11:42:28 CEST 2007


Hello,

I have a vector of samples x of length N.  Associated with each
sample x_i is a certain weight w_i.  All the weights are in another
vector w of the same length N.

I have another vector of samples y of length n (small n).  All
these samples have equal weights 1/n.  The ECDF of these samples
is defined as for example at
http://en.wikipedia.org/wiki/Empirical_distribution_function and
I can compute it using the ecdf() function in R.

I define the 'ECDF' of the samples x with their associated
weights in the following way:

F_N(x) = 1/N * sum_{i=1}^{N}w_i * Indicator(x_i <= x)

(does this 'ECDF' have another name???)

So it's basically the same formula as the one on the above URL, but the
only difference is that I multiply the indicator function for x_i with
the weight w_i.

Now suppose F_n(x) is the ECDF of the n samples with equal
weights 1/n, and F_N(x) is the 'ECDF' of the other samples with
their associated weights.

What I now would like to compute is the maximum difference
between these two, so:

max(abs(F_N(x)-F_n(x)))

So it's like computing the Kolmogorov-Smirnov statistic of two
discrete CDF's.

If i didn't have these weights, or if one of the two was a
continuous CDF, then I could simply use the ks.test() function.
However, my situation is different... my first set of samples has
associated weights and therefore the 'ECDF' has a slightly
different definition.

How can I compute max(abs(F_N(x)-F_n(x))) ?  Do there exist
standard functions for this?

Thanks,
Bart

-- 
	"Share what you know.  Learn what you don't."



More information about the R-help mailing list