[R] Predicting and Plotting "hypothetical" values of factors

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Nov 17 18:00:45 CET 2005


Paul Johnson <pauljohn at ku.edu> writes:

> Last Friday, I noticed that it is difficult to work with regression 
> models in which there are factors.  It is easier to do the old fashioned 
> thing of coding up "dummy" variables with 0-1 values.  The predict 
> function's newdata argument is not suited to insertion of hypothetical 
> values for the factor, whereas it has no trouble with numeric variables. 
>   For example, if one uses a factor as a predictor in a logistic 
> regression, then it is tough to plot the S-shaped curve that describes 
> the probabilities.
> 
> Here is some code with comments describing the difficulties that we have 
> found.  Are there simpler, less frustrating approaches?

I think the point is that if you think a factor can take intermediate
values in between the groups (as in sex==1.245), then it is really a
numeric variable and should be treated as such in the model, whether
or not it has only two distinct values in your data set.

It is not obvious to me what the "S shaped curve" means in a model
which really only specifies two probabilities.

        -p

 
> -----------------------------
> # Paul Johnson  <pauljohn at ku.edu> 2005-11-17
> # factorFutzing-1.R
> 
> 
> myfac <- factor(c(1.1, 4.5, 1.1, 1.1, 4.5, 1.1, 4.5, 1.1))
> 
> y <- c(0,1,0,1,0,0,1,0)
> 
> mymod1 <-glm (y~myfac, family=binomial)
> 
> p1 <- predict(mymod1, type="response")
> 
> plot(myfac,p1)
> 
> # Wait, that's not the picture I wanted. I want to see the
> # proverbial S-shaped curve!
> #
> 
> # The contrast variable that R creates is coded 0 and 1.
> # I'd like to have a sequence from 0 to 1 (seq(0,1,length.out=10)) and
> # get predicted values for each.  That would give the S-shaped curve.
> 
> 
> # But the use of predict does not work because the factor has 2
> # values and the predict function won't take my new data:
> 
> predict(mymod1, newdata=data.frame(myfac=seq(0,1,length.out=8)))
> 
> # Error: variable 'myfac' was fitted with class "factor" but class 
> "numeric" was supplied
> # In addition: Warning message:
> # variable 'myfac' is not a factor in: model.frame.default(Terms, 
> newdata, na.action = na.action, xlev = object$xlevels)
> 
> # Isn't there an easier way than this?
> 
> c1 <- coef(mymod1)
> 
> myseq <- seq(0,1, length.out=10)
> 
> newdf <- data.frame(1, myseq)
> 
> linearPredictor <- as.matrix(newdf) %*% c1
> 
> p2 <- 1/ (1 + exp(-linearPredictor))
> # But there is a big disaster if you just try the obvious thing
> # of plotting with
> # lines(myseq,p2)
> # The line does not show up where you hope in the plot.
> # The plot "appears to" have the x-axis on the scale
> # 1.1 to 4.5, So in order to see the s-shaped curve, it appears we have
> # to re-scale. However, this is a big illusion.  To see what I mean,
> # do
> 
> points(2,.4)
> 
> # you expected to see the point appear at (2,.4), but in the plot
> # it appears at (4.5,.4).  Huh?
> 
> # The actual values being plotted are the integer-valued levels that
> # R uses for factors, the numbers you get from as.numeric(myfac).
> # So it is only necessary to re-scale the sequence by adding one.
> 
> myseq2 <- 1 +  myseq
> 
> lines( myseq2, p2, type="l" )
> 
> #Its not all that S-shaped, but you have to take what you can get! :)
> -----------------------------
> 
> -- 
> Paul E. Johnson                       email: pauljohn at ku.edu
> Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
> 1541 Lilac Lane, Rm 504
> University of Kansas                  Office: (785) 864-9086
> Lawrence, Kansas 66044-3177           FAX: (785) 864-5700
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list