[R] Heckman Selection MOdel Help in R

Arne Henningsen arne.henningsen at googlemail.com
Tue Jul 14 01:25:10 CEST 2009


On Mon, Jul 13, 2009 at 4:26 PM, saurav pathak<pathak.saurav at gmail.com> wrote:
> I am using R 2.9.1,

That's good!

> I am not sure about the version of sampleSelection and maxLik

This is important! Please check the version numbers, e.g. with
R> help(package="maxLik")
R> help(package="sampleSelection")

BTW: Did you install the development version of the maxLik package
from R-Forge? If yes, please use the stable version that is available
on CRAN.

> Let em explain, In my data the DV used as 's' in the formula has some
> missing values, which can lead to bias in our case, so I used
>
> adpopdata$s <- ifelse(is.na(Ln-oy5_1),0,1)
>
> ie I convert all the missing values for the variable ln_oy5_1 to 0 and all
> non-missings as 1, so is the source of missing values from the IVs used in
> the following:
>
> myProbit<- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
>> +     imf_pop + estbbo_m, family = binomial(link = "probit"))
>
> so I dont know where the missing values are coming from, can you suggest how
> to correct it??

They might come from the explanatory variables. Please check, e.g. with
R> sum( is.na( adpopdata$s ) )
R> sum( is.na( adpopdata$age ) )
R> sum( is.na( adpopdata$gender ) )
...
R> sum( is.na( adpopdata$estbbo_m ) )

Please "reply to all" (i.e. including R-help) so that others who will
have similar questions and problems in the future could benefit from
our discussion.

Arne

> On Mon, Jul 13, 2009 at 3:09 PM, Arne Henningsen
> <arne.henningsen at googlemail.com> wrote:
>>
>> On Mon, Jul 13, 2009 at 11:18 AM, Pathak,
>> Saurav<s.pathak08 at imperial.ac.uk> wrote:
>> > Dear Arne
>> > I have gone through the paper and I have tried it at my end, I would
>> > really appreciate if you could address the following:
>> >
>> > 1. Based upon your suggestion I used the following:
>> >
>> > regmod2 <- selection(s ~ age + gender + gemedu + gemhinc + es_gdppc +
>> >    imf_pop + estbbo_m, ln_oy5_1 ~ age+ gender+fearfail+gemedu,
>> > adpopdata, method = "2step")
>> > On trying the above( notice that I have changed "heckit" to "selection"
>> > in the above command, i get the following error message
>> >
>> > Error in coef.probit(result$probit) :
>> >  could not find function "coef.maxLik"
>>
>> That's weird. Which versions of R, sampleSelection, and maxLik do you use?
>>
>> > Before trying the above I tried the following:
>> >
>> > 2. When I tried to do the Heckman selection model in stages , the first
>> > run was successful, I mean, using the following:
>> >
>> > myProbit<- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
>> > +     imf_pop + estbbo_m, family = binomial(link = "probit"))
>> >> summary(myProbit)
>> >
>> > I am successful upto this point, but
>> >
>> > 3. When I try calculating the IMR using the following:
>> > adpopdata$IMR<-invMillsRatio(myProbit)$IMR1
>> >
>> > I get the error below
>> > Error in `$<-.data.frame`(`*tmp*`, "IMR", value = c(2.50039945424535,  :
>> >  replacement has 257358 rows, data has 343251
>>
>> I guess that you have some NAs in the data so that you have the IMRs
>> not for all observations but only for the observations witout NAs.
>>
>> R> myIMRs <- invMillsRatio(myProbit)$IMR1
>> should work.
>>
>> > Is there a code to calculate IMR by hand??
>>
>> Yes, inside invMillsRatio()
>> However, why do you want to do this?
>>
>> > what I see is that the number of rows of IMR calculated and the number
>> > of rows in the actual data set do not match (may be some missing
>> > value issues, I am not sure, if it is, how to fix it?) and hence IMR
>> > could
>> > not be added to my original data set, how do I fix this and then proceed
>> > to get correct IMR to use in my outcome equation  (the OLS stage)
>> >
>> > This is really taking a lot of time, I am working on it for weeks, can
>> > you please help me kindly, If you wish I can send you the data set as
>> > well
>>
>> Please try to fix it yourself.
>>
>> Arne
>>
>> >
>> > -----Original Message-----
>> > From: Arne Henningsen [mailto:arne.henningsen at googlemail.com]
>> > Sent: 13 July 2009 00:56
>> > To: Pathak, Saurav; r-help at r-project.org; otoomet at ut.ee
>> > Subject: Re: Heckman Selection MOdel Help in R
>> >
>> > Hi Saurav!
>> >
>> > On Sun, Jul 12, 2009 at 6:06 PM, Pathak,
>> > Saurav<s.pathak08 at imperial.ac.uk> wrote:
>> >> I am new to R, I have to do a 2 step Heckman model, my selection
>> >> equation is
>> >> below which I was successful in running but I am unable to proceed
>> >> further,
>> >>
>> >>
>> >>
>> >> I have so far used the following command
>> >>
>> >> glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc +
>> >>     imf_pop + estbbo_m, family = binomial(link = "probit"))
>> >>
>> >> My question is
>> >> 1. How do i discard the non significant selection variables (one out of
>> >> the
>> >> seven variables above is non-significant) and calculate the Inverse
>> >> Mills
>> >> Ratio of the significant variables
>> >>
>> >> 2. I need the inverse mills ratio from the above to run the outcome
>> >> equation
>> >> model using OLS with the Inverse mills ratio calculated on the basis of
>> >> the
>> >> above probit as the control in my outcome equation,  hence I need to
>> >> get the
>> >> IMR (Is there another direct way?)
>> >>
>> >> 3. How can this be done in R using my concept or otherwise does there
>> >> exist
>> >> another way of doing what I wish to achieve
>> >>
>> >>
>> >>
>> >> On trying
>> >>
>> >> regmod <- heckit(s ~ age + gender + gemedu + gemhinc + es_gdppc +
>> >>
>> >>     imf_pop + estbbo_m, ln_oy5_1 ~ age+ gender+fearfail+gemedu,
>> >> adpopdata,method="2step")
>> >>
>> >>
>> >>
>> >> I get
>> >>
>> >> Error: could not find function "heckit"
>> >>
>> >>
>> >>
>> >> Error: could not find function "invMillsRatio"
>> >>
>> >>
>> >>
>> >> Am I missing out something, do i have to install something apart from R
>> >> also, so far I have used
>> >>
>> >>
>> >>
>> >> install.packages( "sampleSelection",
>> >> repos="http://R-Forge.R-project.org" )
>> >>
>> >> install.packages("Rcmdr", dependencies=TRUE)
>> >>
>> >>
>> >>
>> >> Even then I am unable to run heckit, please help
>> >
>> > You have to install (only once) and *load* the package before you can
>> > use it:
>> > R> library( "sampleSelection" )
>> >
>> > I suggest that you do NOT use function "heckit" but function
>> > "selection", because the latter allows you to estimate the model both
>> > by the 2-step and the 1-step (ML) method (depending on argument
>> > "method").
>> >
>> > Our paper about the sampleSelection package published in the JSS could
>> > be also helpful for you:
>> > http://www.jstatsoft.org/v27/i07/
>> >
>> > Arne
>> >
>> > --
>> > Arne Henningsen
>> > http://www.arne-henningsen.name
>> >
>>
>>
>>
>> --
>> Arne Henningsen
>> http://www.arne-henningsen.name
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Dr.Saurav Pathak
> PhD, Univ.of.Florida
> Mechanical Engineering
> Doctoral Student
> Innovation and Entrepreneurship
> Imperial College Business School
> s.pathak08 at imperial.ac.uk
> 0044-7795321121
>



-- 
Arne Henningsen
http://www.arne-henningsen.name




More information about the R-help mailing list