[R] Visualizing binary response data?

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Wed May 5 04:17:53 CEST 2010

On 05/04/2010 09:12 PM, Thomas Stewart wrote:
> For binary w.r.t. continuous, how about a smoothing spline?  As in,
> x<-rnorm(100)
> y<-rbinom(100,1,exp(.3*x-.07*x^2)/(1+exp(.3*x-.07*x^2)))
> plot(x,y)
> lines(smooth.spline(x,y))
> OR how about a more parametric approach, logistic regression?  As in,
> glm1<-glm(y~x+I(x^2),family=binomial)
> plot(x,y)
> lines(sort(x),predict(glm1,newdata=data.frame(x=sort(x)),type="response"))
> FOR binary w.r.t. categorical it depends.  Are the categories ordinal (is
> there a natural ordering?) or are the categories nominal (no ordering)?  For
> nominal categories, the data is essentially a contingency table, and
> "strength of the predictor" is a test of independence.  You can still do a
> graphical exploration: maybe plotting the proportion of Y=1 for each
> category of X.   As in,
> z<-cut(x,breaks=-3:3)
> plot(tapply(y,z,mean))
> If your goal is to find strong predictors of Y, you may want to consider
> graphical measures that look at the predictors jointly.  Maybe with a
> generalized additive model (gam)?
> There is probably a lot more you can do.  Be creative.
> -tgs

And you have to decide why you would look to a graph to select 
predictors.  This can badly distort later inferences (confidence 
intervals, P-values, biased regression coefficients, biased R^2, etc.).

> On Tue, May 4, 2010 at 9:04 PM, Kim Jung Hwa<kimhwamaillist at gmail.com>wrote:
>> Hi All,
>> I'm dealing with binary response data for the first time, and I'm confused
>> about what kind of graphics I could explore in order to pick relevant
>> predictors and their relation with response variable.
>> I have 8-10 continuous predictors and 4-5 categorical predictors. Can
>> anyone
>> suggest what kind of graphics I can explore to see how predictors behave
>> w.r.t. response variable...
>> Any help would be greatly appreciated, thanks,
>> Kim
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list