[R] Bias in sample - Logistic Regression

Pedro.Rodriguez at sungard.com Pedro.Rodriguez at sungard.com
Thu Oct 2 23:10:29 CEST 2008

Hi Shiva,

Maybe you are interested in the following paper:

Learning when Training Data are Costly: The Effect of Class Distribution
on Tree Induction. G. Weiss and F. Provost.  Journal of Artificial
Intelligence Research 19 (2003) 315-354.

For validating the models in those enviroments, 

William Elazmeh, Nathalie Japkowicz, Stan Matwin. (2006). A Framework
for Comparative Evaluation of Classifiers in the Presence of Class
Imbalance. Proceedings of the third Workshop on ROC Analysis in Machine
Learning, Pittsburgh, USA.



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Wensui Liu
Sent: Wednesday, October 01, 2008 7:20 PM
To: maithili_shiva at yahoo.com
Cc: r-help at r-project.org
Subject: Re: [R] Bias in sample - Logistic Regression

Hi, Shiva,

The idea of reject inference is very simple. Let's assume a credit card
environment. There are 100 applicants, out of which 50 will be approved
booked in. Therefore, we can only observe the adverse behavior, such as
default and delinquency, of 50 booked accounts. Again, let's assume out
50 booked cards, 5 are bad(default / delinquency). A normal thought is
build a model to "cherry pick" bad guys and then apply the same model to

However, we can only observed the behavior of the applicants booked,
is 50, but not all applicants, which is 100. Therefore, the model result
looks better than what it is supposed to be. This is so-called 'sample
bias'. The same thing can happen to healthcare or direct marketing as

Luckily enough, many people have done some excellent work on this
Please do some readings by Heckman. Greene in NYU has paper in this area
well. And I believe there is also implementation in R. If you use
in industry), take a look at proc qlim.


WenSui Liu
Acquisition Risk, Chase
Email : wensui.x.liu at chase.com
Blog   : statcompute.spaces.live.com

	[[alternative HTML version deleted]]

R-help at r-project.org mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list