[R] Maximum likelihood estimation (stats4::mle)

Tue Jul 22 15:26:03 CEST 2014

On 22 Jul 2014, at 06:04 , David Winsemius <dwinsemius at comcast.net> wrote:

> 
> On Jul 21, 2014, at 12:10 PM, Ronald Kölpin wrote:
> 
>> Dear R-Community,
>> 
>> I'm trying to estimate the parameters of a probability distribution
>> function by maximum likelihood estimation (using the stats4 function
>> mle()) but can't seem to get it working.
>> 
>> For each unit of observation I have a pair of observations (a, r)
>> which I assume (both) to be log-normal distributed (iid). Taking the
>> log of both values I have (iid) normally distributed random variables
>> and the likelihood function to be estimated is:
>> 
>> L = Product(F(x_i) - F(y_i), i=1..n)
>> 
>> where F is the Normal PDF and (x,y) := (log(a), log(r)). Taking the
>> log and multiplying by -1 gives the negative loglikelihood
> 
> I don't see the need to multiply by -1. The log of any probability is of necessity less than (or possibly equal to) 0 since probabilities are bounded above by 1.

Well, mle() wants to minimize -log L by definition. That's just because optimizers tend to prefer to have an objective function to minimize rather than maximize.

However, what is keeping the terms of Product(F(x_i) - F(y_i), i=1..n) from going negative? As far as I can tell, the x_i are bigger than the corresponding y_i, and F is an increasing function so all terms are negative and the log of each term is undefined. Switching the order should help.

Also, what is the rationale for this form of likelihood? Surely not that the x,y pairs are independent lognormals. Looks more like what you'd get from interval censored data. If so, it should be possible to come up with better starting values than mu=0, sd=1.

-pd

> So sums of these number will be negative which then allows you to maximize their sums.
> 
>> 
>> l = Sum(log( F(x_i) - F(y_i) ), i=1..n)
>> 
>> However estimation by mle() produces the error "vmmin is not finite"
> 
> 
> As I would have predicted. If one maximizes numbers that get larger as probabilities get small this is what would be expected.
> 
> -- 
> David.
> 
>> and "NaN have been created" - even though put bound on the parameters
>> mu and sigma (see code below).
>> 
>> 
>> library("stats4")
>> 
>> gaps <- matrix(nrow=10, ncol=4, dimnames=list(c(1:10),c("r_i", "a_i",
>> "log(r_i)", "log(a_i)")))
>> gaps[,1] <- c(2.6, 1.4, 2.2, 2.9, 2.9, 1.7, 1.3, 1.7, 3.8, 4.5)
>> gaps[,2] <- c(9.8, 20.5, 8.7, 7.2, 10.3, 11, 4.5, 5.2, 6.7, 7.6)
>> gaps[,3] <- log(gaps[,1])
>> gaps[,4] <- log(gaps[,2])
>> 
>> nll <- function(mu, sigma)
>> {
>>   if(sigma >= 0 && mu >= 0)
>>   {
>>       -sum(log(pnorm(gaps[,3], mean=mu, sd=sigma) - pnorm(gaps[,4],
>> mean=mu, sd=sigma)))
>>   }
>>   else
>>   {
>>       NA
>>   }
>> }
>> 
>> fit <- mle(nll, start=list(mu=0, sigma=1), nobs=10)
>> print(fit)
>> 
>> 
>> To be honest, I'm stumped and don't quite know what the problem is...
>> 
>> Regards and Thanks,
>> 
>> Ronald Kölpin
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com