[R] Problem with MASS::fitdistr().

Sun Apr 26 10:50:36 CEST 2020

Dear Ms. Spurdle ,

Thanks for looking into this.  I think that you are correct in that it 
is a problem with the hessian calculation.  It seems that fitdistr() 
explicitly sets hessian=TRUE, with no possibility of opting out.

It also seems that optim() ignores the "lower" argument when computing 
the hessian.  I tried setting hessian=TRUE in my roll-your-own call to 
optim() and got the same crash, with alpha  = -0.001999985, which of 
course results in disaster.

Prof. Nash from time to time points out, on this and similar lists, that 
optim() --- which I believe he wrote --- is subject to problems. 
Perhaps this is one of them.  It makes sense that there would be 
difficulties with computing a hessian when the parameters are subject to 
constraints.

It's not at all clear to me how or if these difficulties can be 
circumvented.

I figured that it was kind of nice that fitdistr() provides standard 
errors for the parameter estimates, but this isn't really vital for what 
I am trying to do --- and if trying to find these standard errors makes 
the estimation procedure fall over, then clearly  standard errors have 
to be ditched.

Thanks again for looking into my problem.

cheers,

Rolf

On 26/04/20 7:29 pm, Abby Spurdle wrote:

> fitdistr computes a Hessian matrix.
> I think optim ignores the lower value computing the Hessian.
> 
> Here's the result after removing the Hessian and Hessian-dependent info:
> 
>> str (fit)
> List of 3
>   $ estimate: Named num [1:2] 0.0000000149 1.0797684972
>    ..- attr(*, "names")= chr [1:2] "alpha" "beta"
>   $ loglik  : num -2039
>   $ n       : int 1529
>   - attr(*, "class")= chr "fitdistr"
> 
> 
> On Sun, Apr 26, 2020 at 7:02 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
>>
>> I ran your example.
>> It's possible that it's another bug in the optim function.
>>
>> Here's the optim call (from within fitdistr):
>>
>> stats::optim(x = c(1, 4, 1, 2, 3, 1, 1, 1, 2, 2, 2, 2, 1, 1,
>> 1, 1, 1, 4, 4, 3, 1, 2, 2, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, #more lines...
>> 1, 4, 1, 1, 1, 5, 5, 5, 4, 5, 2, 5, 5, 5, 5, 3, 3, 5, 4, 5, 2, #removed...
>> 4, 5, 5), topn = 5, lower = lwr, par = list(alpha = 1.010652,
>>      beta = 1.929018), fn = function(parm, ...) -sum(log(dens(parm, ...))),
>>      hessian = TRUE, method = "L-BFGS-B")
>>
>> And here's dens:
>>
>> function (parm, x, ...)
>> densfun(x, parm[1], parm[2], ...)
>>
>> I can't see any reason why it should call dens with parm[1] < lower[1].
>>
>> On Sun, Apr 26, 2020 at 5:50 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
>>>
>>> I haven't run your example.
>>> I may try tomorrow-ish if no one else answers.
>>>
>>> But one question: Are you sure the "x" and "i" are correct in your function?
>>> It looks like a typo...
>>>
>>>
>>> On Sun, Apr 26, 2020 at 2:14 PM Rolf Turner <r.turner using auckland.ac.nz> wrote:
>>>>
>>>>
>>>> For some reason fitdistr() does not seem to be passing on the "..."
>>>> argument "lower" to optim() in the proper manner, and as result
>>>> falls over.
>>>>
>>>> Here is my example; note that data are attached in the file "x.txt".
>>>>
>>>> dhse <- function(i,alpha,beta,topn) {
>>>>      x <- seq(0,1,length=topn+2)[-c(1,topn+2)]
>>>>      p <- dbeta(x,alpha,beta)
>>>>      if(any(!is.finite(p))) browser()
>>>>      (p/sum(p))[i]
>>>> }
>>>>
>>>> lwr  <- rep(sqrt(.Machine$double.eps),2)
>>>> par0 <- c(alpha=1.010652,beta=1.929018)
>>>> x    <- dget("x.txt")
>>>> fit  <- MASS::fitdistr(x,densfun=dhse,topn=5,start=as.list(par0),
>>>>                         lower=lwr)
>>>>
>>>> The browser() in dhse() allows you to see that alpha has gone negative,
>>>> taking a value:
>>>>
>>>>>         alpha
>>>>> -0.001999985
>>>>
>>>> Continuing causes fitdistr() to fall over with the error message:
>>>>
>>>>> Error in stats::optim(x = c(1, 4, 1, 2, 3, 1, 1, 1, 2, 2, 2, 2, 1, 1,  :
>>>>>    non-finite finite-difference value [1]
>>>>
>>>> If I eschew using fitdistr() and "roll-my-own" as follows:
>>>>
>>>> foo <- function(par,x,topn){-sum(log(dhse(i=x,alpha=par[1],
>>>>                                             beta=par[2],
>>>>                                             topn=topn)))}
>>>>
>>>> fit <- optim(par0,fn=foo,method="L-BFGS-B",lower=lwr,topn=5,x=x)
>>>>
>>>> then optim() returns a result without complaint.
>>>>
>>>> Am I somehow messing up the syntax for fitdistr()?
>>>>
>>>> cheers,
>>>>
>>>> Rolf Turner
>>>>
>>>> P. S. I've tried supplying the "method" argument, method="L-BFGS-B"
>>>> explicitly to fitdistr(); doesn't seem to help.
>>>>
>>>> R.T.