[R] Problem with MASS::fitdistr().

Mon Apr 27 22:56:28 CEST 2020

Hello,

Inline.

Às 21:30 de 27/04/20, J C Nash escreveu:
> After looking at MASS::fitdistr and fitdistrplus::fitdist, the latter seems to have
> code to detect (near-)singular hessian that is almost certainly the "crash site" for
> this thread. Was that package tried in this work?

I tried it. I didn't post the results because I thought that others had 
probably also tried it and I had no real contribution to give to this 
thread. Except maybe that fitdistrplus::fitdist also hardcodes

hessian = TRUE

in the calls to optim.

And that I had to add 'topn' in the 'start' list, not just 'par0'.
Here is what I got:

library(fitdistrplus)

fitdist(x, distr = dhse, start = as.list(c(par0, topn = 5)))
Called from: dhse(c(1, 4, 1, 2, 3, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 4,
4, 3, 1, 2, 2, 1, 1, 3, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 3, 4, 1,

[...]

5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 5, 5, 5, 1, 2, 2, 2, 3, 1,
3, 2, 3, 1, 4, 1, 1, 3, 4, 2, 1, 1, 4, 3, 5, 5, 1, 1, 4, 1, 1,
1, 5, 5, 5, 4, 5, 2, 5, 5, 5, 5, 3, 3, 5, 4, 5, 2, 4, 5, 5),
     alpha = -0.0865702222222222, beta = 1.7577217037037, topn = 
5.77314814814815)
Browse[1]> Q

Hope this helps,

Rui Barradas

> 
> I agree with Mark that writing one's own code for this is a lot of work, and I know
> the folk who worked on fitdistrplus did a lot more distribution fitting problems
> than I ever did, and I suspect they encountered this issue on occasions.
> 
> JN
> 
> On 2020-04-26 9:18 p.m., Mark Leeds wrote:
>> it's been a looooooooong time but I vaguely remember Rvmminb computing
>> gradients ( and possibly hessians )
>> subject to constraints. John can say more about this but, if one is going
>> to go through the anguish of
>> creating a fitdstr2, then you may want to have it call Rvmminb instead of
>> whatever is currently
>> being called.
>>
>>
>>
>> On Sun, Apr 26, 2020 at 8:55 PM Abby Spurdle <spurdle.a using gmail.com> wrote:
>>
>>> I thought about this some more and realized my last suggestion is
>>> unlikely to work.
>>> Another possibility would be to create a new function to compute the
>>> Hessian with a smaller step size, but I suspect there will be more
>>> problems.
>>>
>>> Possibly a much simpler approach would be to:
>>>
>>> Modify the source for fitdistr.
>>> (Copy the source and create a new function, say fitdistr2).
>>>
>>> Modify it not compute the Hessian in the optim call.
>>> Then after the optim call, test the parameter estimates.
>>> If they're very close to the boundaries (here zero), then they're
>>> flagged as near-boundary cases and the fitdistr2 function returns the
>>> parameter estimates without the Hessian and related info.
>>> (Possibly generating a warning).
>>>
>>> If they're sufficiently distant, the Hessian and related info can be
>>> computed in separate steps and returned.
>>> (Equivalent to what it does currently).
>>>
>>> I note that there's at least one R package (numDeriv), and maybe more,
>>> for computing the Hessian, numerically.
>>>
>>>
>>> On Mon, Apr 27, 2020 at 9:31 AM Abby Spurdle <spurdle.a using gmail.com> wrote:
>>>>
>>>>> Dear Ms. Spurdle
>>>>
>>>> I usually refer to myself as "He".
>>>> (But then, that's not the whole story...)
>>>>
>>>> I'm not an expert on maximum likelihood approaches.
>>>> So, I apologize if the following suggestion is a poor one.
>>>>
>>>> Does your likelihood function have a limit, as alpha approaches zero
>>> (say zero)?
>>>> If so, the limit of the log-likelihood would be -Inf, would it not.
>>>>
>>>> You could create a function representing the likelihood or
>>>> log-likelihood by wrapping your density function.
>>>> The function could allow alpha/beta values equal to or below zero, and
>>>> return the limit.
>>>> This is mathematically incorrect, but may be sufficient for
>>>> permissible estimates of the second-order partial derivatives.
>>>> Depending on the shape of the likelihood function these
>>>> pseudo-likelihoods maybe able to be improved...?
>>>>
>>>> You could then do a small modification on the source code for
>>>> MASS::fitdistr, such that the user specifies the likelihood function
>>>> or log-likelihood function, rather than the density...
>>>>
>>>> The fitdistr function is relatively complex, however, you would only
>>>> need to modify a couple of lines, the lines that create the fn
>>>> function...
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>