[R] ECDF, distribution of Pareto, distribution of Normal

Wed Jul 11 16:19:50 CEST 2007

livia wrote:
> Hello all,
> 
> I would like to plot the emperical CDF, normal CDF and pareto CDF in the
> same graph and I amusing the following codes. "z" is a vector and I just
> need the part when z between 1.6 and 3.
> 
> plot(ecdf(z), do.points=FALSE, verticals=TRUE,
> xlim=c(1.6,3),ylim=c(1-sum(z>1.6)/length(z), 1))
> 
> x <- seq(1.6, 3, 0.1)
> lines(x,pgpd(x, 1.544,0.4373,-0.2398), col="red")
> 
> y <- seq(1.6, 3, 0.1)
> lines(y,pnorm(y, mean(z),sqrt(var(z))), col="blue")
> 
> The emperical CDF and normal CDF look rather resonable, but the pareto CDF
> looks quite odd. I am not sure whether I plot the pareto CDF correctly e.g.
> in the right yaxs or any other mistake?
> 
> At the same time, let "t" represents the vector whose values are larger than
> 1.6(the part we want). If I implement the following codes and plot the
> emperical CDF and pareto CDF, the pareto CDF seems fit.
> 
> plot(ecdf(t), do.points=FALSE, verticals=TRUE)
> x <- seq(1.6, 3, 0.1)
> lines(x,pgpd(x, 1.544,0.4373,-0.2398), col="red")
> 
> Could anyone give me some advice on this? Many thanks.

If any of your data points are less than 1.6, ecdf(z) and ecdf(t)
will be different functions: for arguments greater than 1.6,
the former will take values in c(mean(z<1.6),1) and the latter
will cover the range (0,1).  It is not surprising that your
pgpd function will fit only one of these empirical cdf's closely.

Assuming that those GPD parameters were obtained by fitting to just
the data values greater than 1.6, the GPD curve in your first plot
should be

   u<-mean(z<1.6)
   x<-seq(1.6,3,0.1)
   lines(x, u + (1-u)*pgpd(x, <parameters> )

J. R. M. Hosking