[R] Unable to reproduce Stata Heckman sample selection estimates

Fri Nov 25 17:05:31 CET 2011

Hi Arne,

Thanks for the reply.

I am using R version 2.14.0 and sampleSelection version 0.6.12.

I estimate the model by the 1-step ML method. However, when I use 
the 2-step method, the standard errors are reported as NA.

I use the selection() function, very basic call, something to the 
effect of: selection(selectionFormula, outcomeFormula, data = 
aDataFrame), where the formulas are very straightforward and basic 
as well, y ~ x1 + x2 + ... + xp.

I have read the associated paper, which is where I got the idea to 
pass the coefficients from a seleciton object to the start argument.

I will work on creating a minimal reproducible example; the dataset 
is large and confidential, the models long-ish.

 - Clara

On Friday, November 25, 2011 04:04:52 am Arne Henningsen wrote:
> On 25 November 2011 04:37, Yuan Yuan <y.yuan at vt.edu> wrote:
> > Hello,
> > 
> > I am working on reproducing someone's analysis which was done in
> > Stata. The analysis is estimation of a standard Heckman sample
> > selection model (Tobit-2), for which I am using the 
sampleSelection
> > package and the selection() function. I have a few problems with 
the
> > estimation:
> > 
> > 1) The reported standard error for all estimates is Inf ...
> > vcov(selectionObject) yields Inf in every cell.
> > 
> > 2) While the selection equation coefficient estimates are almost
> > exactly the same as the Stata results, the outcome equation
> > coefficient estimates are quite different (different sign in one 
case,
> > order of magnitude difference in some other cases).
> > 
> > 3) I can't seem to figure out how to specify the initial values 
for
> > the MLE ... whatever argument I pass to start (even of the form
> > coef(selectionObject)), I get the following error:
> > Error in gr[, fixed] <- NA : (subscript) logical subscript too 
long
> > 
> > I have to admit I am pretty confused by #1, I feel like I must 
be
> > doing something wrong, missing something obvious, but I have no 
idea
> > what. I figure #2 might be because the algorithms (selection and
> > Stata) are just finding different local maxima, but because of 
#3 I
> > can't test that guess by using different initial values in 
selection.
> > 
> > Let me know if I should provide any more information. Thanks in
> > advance for any pointers in the right direction.
> 
> Yes, please provide more information (see also the posting guide 
[1]),
> e.g. which version of R and which version of the sampleSelection
> package are you using? Do you estimate the model by the two-step
> approach or by the 1-step maximum likelihood method? Which 
commands
> did use use? Can you send us a reproducible example? Have you read 
the
> paper about using the sampleSelection package [2]?
> 
> [1] http://www.r-project.org/posting-guide.html
> [2] http://www.jstatsoft.org/v27/i07
> 
> Best wishes from copenhagen,
> Arne