[R] Predicting and Plotting "hypothetical" values of factors

Paul Johnson pauljohn at ku.edu
Thu Nov 17 17:47:25 CET 2005


Last Friday, I noticed that it is difficult to work with regression 
models in which there are factors.  It is easier to do the old fashioned 
thing of coding up "dummy" variables with 0-1 values.  The predict 
function's newdata argument is not suited to insertion of hypothetical 
values for the factor, whereas it has no trouble with numeric variables. 
  For example, if one uses a factor as a predictor in a logistic 
regression, then it is tough to plot the S-shaped curve that describes 
the probabilities.

Here is some code with comments describing the difficulties that we have 
found.  Are there simpler, less frustrating approaches?

-----------------------------
# Paul Johnson  <pauljohn at ku.edu> 2005-11-17
# factorFutzing-1.R


myfac <- factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1))

y <- c(0,1,0,1,0,0,1,0)

mymod1 <-glm (y~myfac, family=binomial)

p1 <- predict(mymod1, type="response")

plot(myfac,p1)

# Wait, that's not the picture I wanted. I want to see the
# proverbial S-shaped curve!
#

# The contrast variable that R creates is coded 0 and 1.
# I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and
# get predicted values for each.  That would give the S-shaped curve.


# But the use of predict does not work because the factor has 2
# values and the predict function won't take my new data:

predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8)))

# Error: variable 'myfac' was fitted with class "factor" but class 
"numeric" was supplied
# In addition: Warning message:
# variable 'myfac' is not a factor in: model.frame.default(Terms, 
newdata, na.action = na.action, xlev = object$xlevels)

# Isn't there an easier way than this?

c1 <- coef(mymod1)

myseq <- seq(0,1, length.out=10)

newdf <- data.frame(1, myseq)

linearPredictor <- as.matrix(newdf) %*% c1

p2 <- 1/ (1 + exp(-linearPredictor))
# But there is a big disaster if you just try the obvious thing
# of plotting with
# lines(myseq,p2)
# The line does not show up where you hope in the plot.
# The plot "appears to" have the x-axis on the scale
# 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have
# to re-scale. However, this is a big illusion.  To see what I mean,
# do

points(2,.4)

# you expected to see the point appear at (2,.4), but in the plot
# it appears at (4.5,.4).  Huh?

# The actual values being plotted are the integer-valued levels that
# R uses for factors, the numbers you get from as.numeric(myfac).
# So it is only necessary to re-scale the sequence by adding one.

myseq2 <- 1 +  myseq

lines( myseq2, p2, type="l" )

#Its not all that S-shaped, but you have to take what you can get! :)
-----------------------------

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700




More information about the R-help mailing list