[R] GLM/GAM and unobserved heterogeneity

Thu Aug 25 06:08:28 CEST 2005

	  Have you considered "lmer" in library(lme4)?  See for example sec/ 4 
pm "Two-level models for binary data" in vignette("MlmSoftRev") wiht 
library(mlmRev) in addition to www.r-project.org -> "Documentation: 
Newsletter" -> "R News Volume 5/1" -> "Fitting Linear Mixed Models in R" 
by Doug Bates, pp. 27-30.

	  If you have more questions after reviewing this material please 
submit another question, preferably following the posting guide! 
"http://www.R-project.org/posting-guide.html".  The posting guide is not 
just another symbol of burocracy.  It was written to try to help 
questioners improve the chances that they will get the information they 
want quickly.  I believe it is quite effective when it is used.  Many 
people get answers to their questions in minutes, but that requires a 
question that a potential respondent can understand and formulate a 
sensible answer in seconds.

	  spencer graves

Kyle G. Lundstedt wrote:

> Hello,
>      I'm interested in correcting for and measuring unobserved  
> heterogeneity ("missing variables") using R.  In particular, I'm  
> searching for a simple way to measure the amount of unobserved  
> heterogeneity remaining in a series of increasingly complex models  
> (adding additional variables to each new model) on the same data.
>      I have a static database of 400,000 or so individual mortgage  
> loans, each of which is observed monthly from origination (t=0) until  
> termination (a binary yes/no variable).  In my update database, there  
> are up to 60 months of observed data for each loan in the static  
> database, and an individual loan has an "average life" of roughly 36  
> months.
>      Each loan has static covariates observed at origination, such as  
> original loan amount and credit score, as well as time-varying  
> covariates (TVC) such as age, interest rates, and house prices.   
> Because these TVC change each month, I've constructed a modeling  
> database that merges the static database with the update database.
>      The resulting "loan-month" modeling database has one observation  
> for every loan-month, and the static covariates remain the same for  
> all loan-months for a given loan.  Thus, the modeling database has  
> roughly 14.4 million loan-month records.  A loan is considered  
> "active" as long as it has not yet terminated or been censored; my  
> interest is in predicting termination.
>      This type of data is often referred to as "event history" or  
> "discrete hazard" data.  The standard R package to apply to such data  
> is "survival", with which I could estimate a Cox proportional hazard  
> model using coxph.  The advantage of such an approach is that  
> unobserved heterogeneity is easily addressed using the "frailty" term.
>      The disadvantages, at least for my purposes, are two-fold.   
> First, my audience is unfamiliar with hazard models.  Second, my  
> monthly data has many "ties" (many terminations in the same month),  
> so I've been told that coxph won't work well on a large dataset with  
> many ties.
>      On the other hand, because the data is measured discretely each  
> month, many references suggest applying generalized linear models  
> (GLM, "logit"-type models) or even generalized addivitive models  
> (GAM, "logit"-type models that incorporate nonlinearity in individual  
> covariates).  The advantage to this approach is that GLM and GAM are  
> readily available in R, and my audience is very familiar with logit- 
> type models.
>      The disadvantage, however, is that I am totally unfamiliar with  
> ways to correct for and measure unobserved heterogeneity using GLM/ 
> GAM-type models.  I've been told that unobserved heterogeneity in the  
> hazard framework is analogous to random effects in the GLM/GAM  
> framework, but there seem to be a number of R packages that address  
> this issue in different ways.
>      So, I'd greatly appreciate suggestions on a simple way to  
> incorporate unobserved heterogeneity into a GLM/GAM-type model.  I'm  
> not much of a statistician, so simple examples are always helpful.   
> I'm also happy to track down specific article/book references, if  
> folks think those might be of help.
> 
> Many thanks,
> Kyle
> ---
> kyle  at  hotmail . com
> (email altered in obvious ways)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915