[R] Extrapolating values from a glm fit

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Jan 27 10:41:40 CET 2011


On Wed, 2011-01-26 at 19:25 -1000, Ahnate Lim wrote:
> Even when I try to predict y values from x, let's say I want to predict y at
> x=0. Looking at the graph from the provided syntax, I would expect y to be
> about 0.85. Is this right:
> 
> predict(mylogit,newdata=as.data.frame(0),type="response")

Your original problem was the use of `newdata = as.data.frame(0.5)`.
There are two problems here: i) if you don't name the input (x = 0.5,
say) then you get a data frame with the name(s) "0.5":

> as.data.frame(0.5)
  0.5
1 0.5

and ii) if you do name it, you still get a data frame with name(s) "0.5"

> as.data.frame(x = 0.5)
  0.5
1 0.5

In both cases, predict wants to find a variable with the name `x` in the
object supplied to `newdata`. It finds `x` but your `x` in the global
workspace, but warns because it knows that `newdata` was a data frame
with a single row - so there was a mismatch and you likely made a
mistake.

In these cases, `data.frame()` is preferred to `as.data.frame()`:

predict(mylogit, newdata = data.frame(x = 0), type = "response")

or we can use a list, to save a few characters:

predict(mylogit, newdata = list(x = 0), type = "response")

which give:

> predict(mylogit, newdata = list(x = 0), type = "response")
       1 
0.813069 
> predict(mylogit, newdata = data.frame(x = 0), type = "response")
       1 
0.813069 

In summary, use `data.frame()` or `list()` to create the object passed
as `newdata` and make sure you give the component containing the new
values a *name* that matches the predictor in the model formula.

HTH

G

> 
> # I get:
> 
> Warning message:
> 'newdata' had 1 rows but variable(s) found have 34 rows
> 
> # Why do I need 34 rows? So I try:
> 
> v=vector()
> v[1:34]=0
> predict(mylogit,newdata=as.data.frame(v),type="response")
> 
> # And I get a matrix of 34 values that appear to fit a logistic function,
> instead of 0.85..
> 
> 
> 
> 
> On Wed, Jan 26, 2011 at 6:59 PM, David Winsemius <dwinsemius at comcast.net>wrote:
> 
> >
> > On Jan 26, 2011, at 10:52 PM, Ahnate Lim wrote:
> >
> >  Dear R-help,
> >>
> >> I have fitted a glm logistic function to dichotomous forced choices
> >> responses varying according to time interval between two stimulus. x
> >> values
> >> are time separation in miliseconds, and the y values are proportion
> >> responses for one of the stimulus. Now I am trying to extrapolate x values
> >> for the y value (proportion) at .25, .5, and .75. I have tried several
> >> predict parameters, and they don't appear to be working. Is this correct
> >> use
> >> and understanding of the predict function? It would be nice to know the
> >> parameters for the glm best fit, but all I really need are the
> >> extrapolated
> >> x values for those proportions. Thank you for your help. Here is the code:
> >>
> >> x <-
> >> c(-283.9, -267.2, -250.5, -233.8, -217.1, -200.4, -183.7, -167,
> >> -150.3, -133.6, -116.9, -100.2, -83.5, -66.8, -50.1, -33.4, -16.7,
> >> 16.7, 33.4, 50.1, 66.8, 83.5, 100.2, 116.9, 133.6, 150.3, 167,
> >> 183.7, 200.4, 217.1, 233.8, 250.5, 267.2, 283.9)
> >>
> >> y <-
> >> c(0, 0.333333333333333, 0, 0, 0, 0, 0, 0, 0, 0.333333333333333,
> >> 0, 0.133333333333333, 0.238095238095238, 0.527777777777778,
> >> 0.566666666666667,
> >> 0.845238095238095, 0.55, 1, 0.888888888888889, 1, 1, 1, 1, 1,
> >> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5)
> >>
> >> weight <-
> >> c(1, 3, 2, 5, 4, 4, 3, 5, 5, 4, 5, 11, 22, 11, 15, 16, 11, 7,
> >> 14, 10, 16, 19, 11, 5, 4, 5, 6, 9, 4, 2, 5, 5, 2, 2)
> >>
> >> mylogit <- glm(y~x,weights=weight, family = binomial)
> >>
> >> # now I try plotting the predicted value, and it looks like a good fit,
> >> hopefully I can access what the glm is doing
> >>
> >> ypred <- predict(mylogit,newdata=as.data.frame(x),type="response")
> >> plot(x, ypred,type="l")
> >> points(x,y)
> >>
> >> # so I try to predict the x value when y (proportion) is at .5, but
> >> something is wrong..
> >>
> >
> > Right. Predict goes in the other direction ... x predicts y.
> >
> > Perhaps if you created a function of y w.r.t. x and then inverted it.
> >
> > ?approxfun  # and see the posting earlier this week "Inverse Prediction
> > with splines" where it was demonstrated how to reverse the roles of x and y.
> >
> >>
> >> predict(mylogit,newdata=as.data.frame(0.5))
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list