[R] fitting distributions with R

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Tue Sep 6 17:46:16 CEST 2005


On 06-Sep-05 Huntsinger, Reid wrote:
> The MLE of beta is the reciprocal of the sample mean, so you
> don't need an optimizer here.
> 
> Reid Huntsinger

While that is true (and Naja clearly knew this), nevertheless
one expects that using an optimiser should also work. Nadja's
observations need an explanation.

If things don't behave as expected, it is worth while embedding
"debug prints" so as to monitor what is going on internally (as
fas as one can). In this case, if one modifies Nadja's "ll"
function to

ll<-function(beta){
  n<-24
  x<-data2
  temp<-(-n*log(beta)+beta*sum(x))
  print(temp)
  temp
}

and re-runs 'mle', one sees that while there are some numerical
values in the output, there are many NaNs. Also, given the
warning message and the advice to look at "warnings()", one
learns that "NaNs produced in: log(x)" repeatedly. This very
strongly suggests that attempts have been made to take logs
of negative numbers which in trun suggests that the method
of computing the next approximation readily takes the value
of beta outside the valid range of beta > 0.

Now is the time to look at "?mle", which says that the default
method is "BGFS" for which "see optim". Under "?optim" we learn
that "BGFS" is a "quasi-Newton method". Such methods work by
calculating a local tangent to the derivative function and
extrapolating this until it meets the beta-axis, and this can
easily take the estimate outside admissible ranges (try using
Newton-Raphson to solve sqrt(x) = 0).

However, a related method available for 'optim' is "L-BFGS-B"
"which allows _box constraints_, that is each variable can be
given a lower and/or upper bound. The initial value must satisfy
the constraints." This can be set in a parameter for 'mle'.

So now you can try something like

  est<-mle(minuslog=ll, start=list(beta=0.1),
           method="L-BFGS-B", lower=10*(.Machine$double.eps))

and now the trace-prints show a series of numbers, with no NaNs,
so clearly we are getting somewhere (and have identified and
dealt with at least one aspect of the problem). However, observing
the prints, one sees that after an initial trend to convergence
there is a tendency to oscillate between values in the neighbouhood
of beta=360 and values in the neighbourhood of beta=800, finally
failing when two successive values "360.6573" are printed, which
in turn suggests that an attempt is made to comuted a gradient
from identical points. So clearly there is something not right
about how the method works for this particular problem (which,
as a statistical estimation problem, could hardly be simpler!).

Now, "?optim" has, at the end, a Note to the effect that the
default method (admittedly Nelder-Mead, which is not relevant
to the above) may not work well and suggests using 'optimize'
instead. So let's try 'optimize' anyway.

Now, with

  optimize(ll,lower=10*(.Machine$double.eps),upper=1e10)

we get a clean set of debug-prints, and convergence to

  beta = 5.881105e-05

with "minimum" 'll' equal to 254.6480.

Now compare with the known MLE which is

  beta = 1/mean(data2) = 6.766491e-05

giving

  ll(1/mean(data2)) = 254.4226, 

So clearly, now, using 'optimise' instead of 'optim' which
is what 'mle' uses, we are now "in the right parish". However,
there is apparently no parameter to 'mle' which would enable
us to force it to use 'optimize' rather than 'optim'!

This interesting saga, provoked by Nadja's query, now raises
an important general question: Given the simplicity of the
problem, why is the use of 'mle' so unexpectedly problematic?

While in the case of an exponential distribution (which has a
well-known analytical solution) one would not want to use 'mle'
to find the MLE (except as a test of 'mle'. perhaps), one can
easily think of other distributions, in form and behaviour very
similar to the negative exponential but without analytical solution,
for which use of 'mle' or some other optimisation routine would
be required. Such distributions could well give rise to similar
problems -- or worse: in Nadja's example,it was clear that it was
not working; in other cases, it might appear to give a result,
but the result might be very wrong and this would not be obvious.

Hmmm.

Ted.

> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Nadja Riedwyl
> Sent: Tuesday, September 06, 2005 9:39 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] fitting distributions with R
> 
> 
> Dear all
> I've got the dataset
> data:2743;4678;21427;6194;10286;1505;12811;2161;6853;2625;14542;694;1149
> 1;
> _ _ _ _ _ 14924;28640;17097;2136;5308;3477;91301;11488;3860;64114;14334
> I know from other testing that it should be possible to fit the data
> with
> the 
> exponentialdistribution. I tried to get parameterestimates for the 
> exponentialdistribution with R, but as the values 
> of the parameter are very close to 0 i get into troubles. Do you know,
> what
> i 
> could do in order to get estimates?How do you choose the starting
> values? in
> 
> my opinion it should be around 1/mean(data).
> 
>  
>#Parameterestimation _with mle() with the log-likelihood funktion of the
>#_
>#exponentialdistribution
> library(stats4) 
> ll<-function(beta) 
> {n<-24 
> x<-data2
> -n*log(beta)+beta*sum(x)} 
> est<-mle(minuslog=ll, start=list(beta=0.1))
> summary(est) 
> 
>#instead of a result, i get:
> 
> 
> Error in optim(start, f, method = method, hessian = TRUE, ...) :
> _ _ _ _ non-finite finite-difference value [1]
> In addition: There were 50 or more warnings (use warnings() to see the
> first
> 
> 50)
>#with fitdistr() for the exponentialdistribution
> library(MASS)
> fitdistr(data2,densfun=dexp,start=list(rate=0.1),lower=6e-06,method="BFG
> S")
> 
>#instead of a result, i get
> 
> Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) :
> _ _ _ _ non-finite finite-difference value [1]
> In addition: Warning messages:
> 1: bounds can only be used with method L-BFGS-B in: optim(start,
> mylogfn, x
> = 
> x, hessian = TRUE, ...)
> 2: NaNs produced in: dexp(x, 1/rate, log)
> 
> 
> i'll be very happy for any help i can get to solve this problem
> thank you!
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Sep-05                                       Time: 16:46:12
------------------------------ XFMail ------------------------------




More information about the R-help mailing list