[R] nlminb() - how do I constrain the parameter vector properly?

Mon Oct 21 02:19:07 CEST 2013

On Oct 20, 2013, at 3:01 PM, Steven LeBlanc wrote:

> Greets,
> 
> I'm trying to use nlminb() to estimate the parameters of a bivariate normal sample and during one of the iterations it passes a parameter vector to the likelihood function resulting in an invalid covariance matrix that causes dmvnorm() to throw an error. Thus, it seems I need to somehow communicate to nlminb() that the final three parameters in my parameter vector are used to construct a positive semi-definite matrix, but I can't see how to achieve this using the constraint mechanism provided. Additional details are provided in the code below.
> 

I recently noticed that teh dmvnorn function in package mixtools doe not throw an error when given a non-positive definite varaince-covariance matrix. Whether it should be doing so might be up for debate. Peter Dalgaard seemed to think that any self respecting such function _should_ be throwing NaN's back at you.

The code it was using is rather compact:

> mixtools::dmvnorm
function (y, mu = NULL, sigma = NULL) 
{
    exp(logdmvnorm(y, mu = mu, sigma = sigma))
}
<environment: namespace:mixtools>

> mixtools::logdmvnorm
function (y, mu = NULL, sigma = NULL) 
{
    if (is.vector(y)) 
        y <- matrix(y, nrow = 1)
    d <- ncol(y)
    if (!is.null(mu)) 
        y <- sweep(y, 2, mu, "-")
    if (is.null(sigma)) 
        sigma <- diag(d)
    k <- d * 1.83787706640935
    a <- qr(sigma)
    logdet <- sum(log(abs(diag(a$qr))))
    if (nrow(y) == 1) 
        mahaldist <- as.vector(y %*% qr.solve(a, t(y)))
    else mahaldist <- rowSums((y %*% qr.solve(a)) * y)
    -0.5 * (mahaldist + logdet + k)
}

I noticed that you were logging the dmvnorm values and so I wondered also if going directly to that function might save some time. (Note that I am way out of my mathematical dept here. Caveat emptor, maxima caveat emptor).

-- 
David.

> Suggestions?
> 
> Best Regards,
> Steven
> 
> Generate the data set I'm using:
> 
> mu<-c(1,5)
> sigma<-c(1,2,2,6)
> dim(sigma)<-c(2,2)
> set.seed(83165026)
> sample.full<-sample.deleted<-mvrnorm(50,mu,sigma)
> delete.one<-c(1,5,13,20)
> delete.two<-c(3,11,17,31,40,41)
> sample.deleted[delete.one,1]<-NA
> sample.deleted[delete.two,2]<-NA
> missing<-c(delete.one,delete.two)
> complete<-sample.deleted[!(is.na(sample.deleted[,1]) | is.na(sample.deleted[,2])),]
> deleted<-sample.deleted[missing,]
> 
> Try to estimate the parameters of the data set less the deleted values:
> 
> exact<-function(theta,complete,deleted){
>    one.only<-deleted[!(is.na(deleted[,1])),1]
>    two.only<-deleted[!(is.na(deleted[,2])),2]
>    u<-c(theta[1],theta[2])
>    sigma<-c(theta[3],theta[5],theta[5],theta[4])
>    dim(sigma)<-c(2,2)
>    -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))-
>        sum(log(dnorm(one.only,u[1],sigma[1,1])))-
>            sum(log(dnorm(two.only,u[2],sigma[2,2])))
> }
> nlminb(start=c(0,0,1,1,0),objective=exact,complete=complete,deleted=deleted,control=list(trace=1))
> 
> Escape and it stops at Iteration 9 on my machine.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA