[R] Maximum likelihood estimation of a truncated regression model

Tue Sep 16 20:40:52 CEST 2008

Hi,

I have a quick question regarding estimation of a truncation
regression model (truncated above at 1) using MLE in R. I will be most
grateful to you if you can help me out.

The model is linear and the relationship is "dhat = bhat0+Z*bhat+e",
where dhat is the dependent variable >0 and upper truncated at 1;
bhat0 is the intercept; Z is the independent variable and is a uniform
random variate between 0 and 1; and e is the error term. I realised
that R doesn't have a built-in function to estimate truncated
regression models as does STATA, LIMDEP etc. I tried the survival and
FEAR packages and couldn't fit it for my case. So I wrote the log
likelihood function of the truncated regression model and
reparametrised it using Olsen (1978) so that the function is globally
concave and has an unique maximiser. I used a quasi-Newton method
(BFGS) to maximise my function. I also used Newton-Raphson method
(maxNR) to maximise my function. The (naive) code can be seen below.

sw1<-function(theta,dhat,z)
{
gamma<-theta[1:2]
eta<-theta[3]
d1<-dhat*eta-z%*%gamma
d2<- eta-z%*%gamma
logl<- (-n*0.5*log(2*pi))+(n*log(eta))-(0.5*sum(d1*d1))-(sum(log(pnorm(d2))))
return(-logl)
}
try(j<-optim(c(a,b,c),sw1,method="BFGS",dhat=dhat,z=z),silent=TRUE)
q[i,]=abs(j$par[2]/j$par[3])

I am trying to iterate the above procedure 2000 times and compute the
RMSE of the estimated bhat. In the above code, a, b, and c are the
intercept, bhat and 1/sigma estimates from OLS of the linear model [I
get the standard error estimate from OLS and obtain the sigma by
multiplying it by sqrt(n) where n=200]. I realised that in many of the
iterations, the above code doesn't optimize as the bhat estimate I get
from the above code is worser than the one I get from STATA although
optim (or maxNR) says the function has converged successfully. I
realise that the starting parameter values could be a reason for these
sub-optimal solutions.

1. Is there an optimization routine that doesn't need starting parameter values?
2. Can you please tell me what could be the possible error in the above code?
3. Is there a routine built in R that could do truncated regression
for my situation?

Thank you very much for your help.

Best Regards,

Srini