[R] What ECDF function?

Shiazy Fuzzy shiazy at gmail.com
Sun Jun 10 00:36:05 CEST 2007


On 6/9/07, Robert A LaBudde <ral at lcfltd.com> wrote:
> At 12:57 PM 6/9/2007, Marco wrote:
> ><snip>
> >2.I found various version of P-P plot  where instead of using the
> >"ecdf" function use ((1:n)-0.5)/n
> >   After investigation I found there're different definition of ECDF
> >(note "i" is the rank):
> >   * Kaplan-Meier: i/n
> >   * modified Kaplan-Meier: (i-0.5)/n
> >   * Median Rank: (i-0.3)/(n+0.4)
> >   * Herd Johnson i/(n+1)
> >   * ...
> >   Furthermore, similar expressions are used by "ppoints".
> >   So,
> >   2.1 For P-P plot, what shall I use?
> >   2.2 In general why should I prefer one kind of CDF over another one?
> ><snip>
>
> This is an age-old debate in statistics. There are many different
> formulas, some of which are optimal for particular distributions.
>
> Using i/n (which I would call the Kolmogorov method), (i-1)/n or
> i/(n+1) is to be discouraged for general ECDF modeling. These
> correspond in quality to the rectangular rule method of integration
> of the bins, and assume only that the underlying density function is
> piecewise constant. There is no disadvantage to using these methods,
> however, if the pdf has multiple discontinuities.
>
> I tend to use (i-0.5)/n, which corresponds to integrating with the
> "midpoint rule", which is a 1-point Gaussian quadrature, and which is
> exact for linear behavior with derivative continuous. It's simple,
> it's accurate, and it is near optimal for a wide range of continuous
> alternatives.
>

Hmmm I'm a bit confused, but very interested!
So you don't use the R "ecdf", do you?

> The formula (i- 3/8)/(n + 1/4) is optimal for the normal
> distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so
> there is no real benefit to using it. Similarly, there is a formula
> (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure
> (don't need to test) the form of the distribution, you're better off
> fitting that distribution function directly and not worrying about the edf.
>
> Also remember that edfs are not very accurate, so the differences
> between these formulae are difficult to justify in practice.
>

I will bear in min! My first interpretation was that using some
different from i/n (e.g. i/(n+1)),
let to better individuate tail differences (maybe...)

Regards,

-- Marco

> ================================================================
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
> Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
> 824 Timberlake Drive                     Tel: 757-467-0954
> Virginia Beach, VA 23464-3239            Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list