[R] distribution fitting

Thu Oct 23 13:40:45 CEST 2008

Dear R-help readers,

I am writing to you in order to ask you a few questions about
distribution fitting in R.

I am trying to find out whether the set of event interarrival times
that I am currently analyzing is distributed with a Gamma or General
Pareto distribution. The event arrival granularity is in minutes and
interarrival times are in seconds, so the values I have are 0, 60,
120, 180, and so on...

Here is what I did. Since several values of the interarrival_times
array are zero, to avoid numerical errors when calculating their
logarithm, I set them to 1E-10:

> # fix numerical values
> for (i in 1:length(interarrival_times)) {
+       if (interarrival_times[i] == 0)
+               interarrival_times[i] = 0.0000000001
+ }

Then I defined the negative log likelihood for the Gamma distribution:

> nll_gamma_log <- function(lambda_log, alfa_log) {
+       n <- length(interarrival_times)
+       x <- interarrival_times
+       -n*exp(alfa_log)*lambda_log + n*log(gamma(exp(alfa_log))) -
(exp(alfa_log)-1)*sum(log(x)) + exp(lambda_log)*sum(x)
+ }

and passed it to the mle function:

> est_gamma <- mle(minuslog = nll_gamma_log, start = list(lambda_log = 0, alfa_log=0))
There were 50 or more warnings (use warnings() to see the first 50)

of course I got many "value out of range" warnings:

> warnings()
Warning messages:
1: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'
[...]
7: In gamma(exp(log_alfa)) ... : NaNs produced
[...]
50: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'

but that was expected, right?

The real problems come out when I try to fit a General Pareto
distribution. Using the same method, I calculated the negative log
likelihood function for the General Pareto distribution:

> # negative log-likelihood function for general pareto distribution
> nll_gpd <- function(xi, log_sigma, mu) { # shape, scale, location
+         n <- length(interarrival_times)
+         x <- interarrival_times
+         cat("xi:",xi,"; log_sigma:",log_sigma,"; mu:",mu,"\n")
+         n*log_sigma + (xi+1)*sum(log(1+xi*(x-mu)/exp(log_sigma)))/xi
+ }

and passed it to the mle function. However, this time I get some errors:

> est_gpd <- mle(minuslog = nll_gpd, start = list(xi = 1, log_sigma = 1, mu = 1))
xi: 1 ; log_sigma: 1 ; mu: 1
xi: 1.001 ; log_sigma: 1 ; mu: 1
xi: 0.999 ; log_sigma: 1 ; mu: 1
xi: 1 ; log_sigma: 1.001 ; mu: 1
xi: 1 ; log_sigma: 0.999 ; mu: 1
xi: 1 ; log_sigma: 1 ; mu: 1.001
xi: 1 ; log_sigma: 1 ; mu: 0.999
xi: 52007.84 ; log_sigma: 13414.95 ; mu: 4241.565
xi: 10402.37 ; log_sigma: 2683.789 ; mu: 849.113
xi: 18.08721 ; log_sigma: 3.862682 ; mu: 2.627865
xi: 18.08721 ; log_sigma: 3.860682 ; mu: 2.627865
Error in optim(start, f, method = method, hessian = TRUE, ...) :
  non-finite finite-difference value [2]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I also get many warnings:

> warnings()
Warning messages:
1: In base::log(x, base) ... : NaNs produced
2: In base::log(x, base) ... : NaNs produced
[...]
50: In base::log(x, base) ... : NaNs produced

I really don't know how to fix this problem. Do you have any suggestions?

Thank you very much in advance.

Best regards,
Mauro Tortonesi

--
Mauro Tortonesi, Ph.D.

Research Assistant
Distributed Systems Research Group
Engineering Department
University of Ferrara