[R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA

Thu Jul 16 12:41:56 CEST 2009

Dear Saurav,

I get the feeling that you are looking for mixed models. Try something
like.

library(lme4)
glmer(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop +
estbbo_m + (1|yearctry), family = binomial(link = "probit"), data =
adpopdata) 

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens saurav pathak
Verzonden: donderdag 16 juli 2009 12:18
Aan: r-help at r-project.org
Onderwerp: [R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA

Hello all

I have been working to fix this for weeks now, It should be simple to
fix.
Please help

Let me explain what I am doing, I have a data set for 65 countries over
a period of 9 years (2000-2008). Each country has on an average say 2000
interviews, so that the total set has roughly 65*9*2000 data
points/observations (of course there are missing vales as well). Now let
me explain how are the data clustered or grouped. I use the variable
"yearctry"
which is computed as year*10000+ international phone code of the
country, say for example USA with calling code 001 for the year 2000
will have a yearctry value = 2000001. Under this particular value of
yearctry of 2000001 there are roughly 2000 observations, next for the
same year for say UK the yearctry value would be 2000044 (having roughly
2000 observations) , and similarly so on for the rest of the 63
countries for the year 2000 and all other years from 2000 to 2008. For
say the year 2001, the values of yearctry for USA and UK would be
2001001 and 2001044 respectively (again 2000 obseravations for each
country roughly) and so on for the other 63 countries as well. So the
data set is *grouped/clustered using "yearctry"*

I am trying to look into a selection bias if any within each "yearctry"
(ie 2000 observation for one country for 9 years and so on for 65
countries) value, essentially therefore I wish to check for 65*9 values
of "yearctry"
with each "yearctry" having 2000 observations roughly. Hence I use the
glm/probit to look into the selection bias where all my dependant
variable "s" are either  0 or 1. The formula

*myProbit<- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop
+ estbbo_m, family = binomial(link = "probit"), data =
adpopdata)*

is the Heckman selection equation based on all observations without
taking into account the fact that each "yearctry" is unique, I want the
selection equation to recognise the uniqueness of each "yearctry" value
, takes one "yearctry" at a time, estimates the probit, goes to the next
"yearctry"
repeats the probit regression and then give me the result. At the moment
I do not accomplish that using the above formula. The above formula does
regression on a bulk basis, but I wish that it recognises one yearctry
from the other and then performs the regression for all yearctry values
and finally produces me the result

Is there any other model recommended that should do the job other than
the glm???If Yes please help how?

Let me give you the exact command that Stata uses, so that things become
very clear:

*xtprobit s age gender gemeduc gemhinc es_gdppc imf_pop estbbo_m,
i(yearctry)*

This does exactly what I wish to accomplish in R, ie does the heckman
selection equation for the selection variables (seven in my case) based
upon the uniqueness of "yearctrty"

I have worked weeks on this, kindly help me, I think it is a small issue
to fix in the equation, although since I am new to R, I do not exactly
know what exactly will fix my problem, so any help will be highly
appreciated Thanks

--
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.pathak08 at imperial.ac.uk
0044-7795321121

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.