[R] parameters estimation of a normal-lognormal multivariate model

Mon Jun 10 22:01:03 CEST 2013

Dear all,

I have to create a model which is a mixture of a normal and log-normal distribution. To create it, I need to estimate the 2 covariance matrixes and the mixing parameter (total =7 parameters) by maximizing the log-likelihood function. This maximization has to be performed by the nlm routine.
As I use relative data, the means are known and equal to 1.

I’ve already tried to do it in 1 dimension (with 1 set of relative data) and it works well. However, when I introduce the 2nd set of relative data I get illogical results for the correlation and a lot of warnings messages (at all 25).

To estimates the parameters I defined first the log-likelihood function with the 2 commands dmvnorm and dlnorm.plus. Then I assign starting values of the parameters and finally I use the nlm routine to estimate the parameters (see script below).

# Importing and reading the grid files. Output are 2048x2048 matrixes

P <- read.ascii.grid("d:/Documents/JOINT_FREQUENCY/grid_E727_P-3000.asc", return.header= FALSE ); 
V <- read.ascii.grid("d:/Documents/JOINT_FREQUENCY/grid_E727_V-3000.asc", return.header= FALSE ); 

p <- c(P); # tranform matrix into a vector
v <- c(V);

p<- p[!is.na(p)] # removing NA values
v<- v[!is.na(v)]

p_rel <- p/mean(p) #Transforming the data to relative values
v_rel <- v/mean(v) 
PV <- cbind(p_rel, v_rel) # create a matrix of vectors

L <- function(par,p_rel,v_rel) {

return (-sum(log( (1- par[7])*dmvnorm(PV, mean=c(1,1), sigma= matrix(c(par[1]^2, par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2]^2 ),nrow=2, ncol=2))+
par[7]*dlnorm.rplus(PV, meanlog=c(1,1), varlog= matrix(c(par[4]^2,par[4]*par[5]*par[6],par[4]*par[5]*par[6],par[5]^2), nrow=2,ncol=2))            )))

}
par.start<- c(0.74, 0.66 ,0.40, 1.4, 1.2, 0.4, 0.5) # log-likelihood estimators

result<-nlm(L,par.start,v_rel=v_rel,p_rel=p_rel, hessian=TRUE, iterlim=200, check.analyticals= TRUE)
Messages d'avis :
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  production de NaN
2: In sqrt(2 * pi * det(varlog)) : production de NaN
3: In nlm(L, par.start, p_rel = p_rel, v_rel = v_rel, hessian = TRUE) :
  NA/Inf replaced by maximum positive value
4: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  production de NaN
…. Until 25.

par.hat <- result$estimate

cat("sigN_p =", par[1],"\n","sigN_v =", par[2],"\n","rhoN =", par[3],"\n","sigLN_p =", par[4],"\n","sigLN_v =", par[5],"\n","rhoLN =", par[6],"\n","mixing parameter =", par[7],"\n")

sigN_p = 0.5403361 
 sigN_v = 0.6667375 
 rhoN = 0.6260181 
 sigLN_p = 1.705626 
 sigLN_v = 1.592832 
 rhoLN = 0.9735974 
 mixing parameter = 0.8113369

Does someone know what is wrong in my model or how should I do to find these parameters in 2 dimensions?

Thank you very much for taking time to look at my questions.

Regards,

Gladys Hertzog
Master student in environmental engineering, ETH Zurich