[R] maximum likelihood accuracy - comparison with Stata

Mon Mar 28 17:08:57 CEST 2011

On 2011-03-27 21:37, Alex Olssen wrote:
> Hi everyone,
>
> I am looking to do some manual maximum likelihood estimation in R.  I
> have done a lot of work in Stata and so I have been using output
> comparisons to get a handle on what is happening.
>
> I estimated a simple linear model in R with   lm()   and also my own
> maximum likelihood program.  I then compared the output with Stata.
> Two things jumped out at me.
>
> Firstly, in Stata my coefficient estimates are identical in both the
> OLS estimation by   -reg-  and the maximum likelihood estimation using
> Stata's   ml lf  command.
> In R my coefficient estimates differ slightly between   lm()   and my
> own maximum likelihood estimation.
>
> Secondly, the estimates for   sigma2   are very different between R
> and Stata.  3.14 in R compared to 1.78 in Stata.
>
> I have copied my maximum likelihood program below.  It is copied from
> a great intro to MLE in R by Macro Steenbergen
> http://artsci.wustl.edu/~jmonogan/computing/r/MLE_in_R.pdf
>
> Any comments are welcome.  In particular I would like to know why the
> estimate of   sigma2   is so different.  I would also like to know
> about the accuracy of the coefficient estimates.

Some comments:

1.
Since Kmenta is not in the datasets package, you should mention
where you're getting it. (I suppose that for economists, it's
obvious, but we're not all economists.) I used the version in
the systemfit package.

2.
I don't believe your Stata value for sigma2 (by which I assume
you mean sigma^2). Are you quoting sigma?

3.
I can't reproduce your R value of 3.14 for sigma2. I get 3.725391.

4.
(most important) There's a difference between the simple ML
estimate (which is biased) and R's unbiased estimate for sigma^2.
This dataset has 20 cases, so try multiplying your result by 20/17.

5.
As to any difference in coefficients, I would guess that the
difference is slight (you don't show what it is); it
may be due to the fact that optim() produces approximate
solutions, whereas in this simple case, an 'exact' solution
is possible (and found by R).

6.
In case you weren't aware of it: the stats4 package has an
mle() function.

Peter Ehlers

>
> ## ols
> ols<- lm(Kmenta$consump ~ Kmenta$price + Kmenta$income)
> coef(summary(ols))
>
> ## mle
> y<- matrix(Kmenta$consump)
> x<- cbind(1, Kmenta$price, Kmenta$income)
> ols.lf<- function(theta, y, x) {
>    N<- nrow(y)
>    K<- ncol(x)
>    beta<- theta[1:K]
>    sigma2<- theta[K+1]
>    e<- y - x%*%beta
>    logl<- -0.5*N*log(2*pi)-0.5*N*log(sigma2)-((t(e)%*%e)/(2*sigma2))
>    return(-logl)
> }
> p<- optim(c(0,0,0,2), ols.lf, method="BFGS", hessian=T, y=y, x=x)
> OI<- solve(p$hessian)
> se<- sqrt(diag(OI))
> se<- se[1:3]
> beta<- p$par[1:3]
> results<- cbind(beta, se)
> results
> sigma2<- p$par[4]
> sigma2
>
> Kind regards,
>
> Alex Olssen
> Motu Research
> Wellington
> New Zealand
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.