[R] What ECDF function?

Robert A LaBudde ral at lcfltd.com
Sat Jun 9 22:26:09 CEST 2007


At 12:57 PM 6/9/2007, Marco wrote:
><snip>
>2.I found various version of P-P plot  where instead of using the
>"ecdf" function use ((1:n)-0.5)/n
>   After investigation I found there're different definition of ECDF
>(note "i" is the rank):
>   * Kaplan-Meier: i/n
>   * modified Kaplan-Meier: (i-0.5)/n
>   * Median Rank: (i-0.3)/(n+0.4)
>   * Herd Johnson i/(n+1)
>   * ...
>   Furthermore, similar expressions are used by "ppoints".
>   So,
>   2.1 For P-P plot, what shall I use?
>   2.2 In general why should I prefer one kind of CDF over another one?
><snip>

This is an age-old debate in statistics. There are many different 
formulas, some of which are optimal for particular distributions.

Using i/n (which I would call the Kolmogorov method), (i-1)/n or 
i/(n+1) is to be discouraged for general ECDF modeling. These 
correspond in quality to the rectangular rule method of integration 
of the bins, and assume only that the underlying density function is 
piecewise constant. There is no disadvantage to using these methods, 
however, if the pdf has multiple discontinuities.

I tend to use (i-0.5)/n, which corresponds to integrating with the 
"midpoint rule", which is a 1-point Gaussian quadrature, and which is 
exact for linear behavior with derivative continuous. It's simple, 
it's accurate, and it is near optimal for a wide range of continuous 
alternatives.

The formula (i- 3/8)/(n + 1/4) is optimal for the normal 
distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so 
there is no real benefit to using it. Similarly, there is a formula 
(i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure 
(don't need to test) the form of the distribution, you're better off 
fitting that distribution function directly and not worrying about the edf.

Also remember that edfs are not very accurate, so the differences 
between these formulae are difficult to justify in practice.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list