[R] fitting in logistic model

Tue Jun 30 16:52:20 CEST 2009

On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote:

> I would like to know how R computes the probability of an event in a
> logistic model (P(y=1)) from the score s, linear combination of x and
> beta.
>
> I noticed that there are differences (small, less than e-16) between  
> the
> fitting values automatically computed in the glm procedure by R, and  
> the
> values "manually" computed by me applying the reverse formula
> p=e^s/(1+e^s); moreover I noticed  that the minimum value of the  
> fitting
> values in my estimation is 2.220446e-16, and there are many  
> observation
> with this probability (instead the minimum value obtained by  
> "manually"
> estimation is 2.872636e-152).

It would be helpful to see at least a subset of the output from your  
model and your manual computations so that we can at least visually  
compare the results to see where the differences may be.

The model object returned from using glm() will contain both the  
linear predictors on the link scale (model$linear.predictors) and the  
fitted values (model$fitted.values). The latter can be accessed using  
the fitted() extractor function.

To use an example, let's create a simple LR model using the infert  
data set as referenced in ?glm.

model1 <- glm(case ~ spontaneous + induced, data = infert, family =  
binomial())

 > model1

Call:  glm(formula = case ~ spontaneous + induced, family =  
binomial(), data = infert)

Coefficients:
(Intercept)  spontaneous      induced
     -1.7079       1.1972       0.4181

Degrees of Freedom: 247 Total (i.e. Null);  245 Residual
Null Deviance:	    316.2
Residual Deviance: 279.6 	AIC: 285.6

# Get the coefficients
 > coef(model1)
(Intercept) spontaneous     induced
  -1.7078601   1.1972050   0.4181294

# get the linear predictor values
# log odds scale for binomial glm
 > head(model1$linear.predictors)
           1           2           3           4           5           6
  1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564  0.32560375

You can also get the above by using the coefficients and the model  
matrix for comparison:

# the full set of 248
# coef(model1) %*% t(model.matrix(model1))

 > head(as.vector(coef(model1) %*% t(model.matrix(model1))))
[1]  1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564   
0.32560375

# get fitted response values (predicted probs)
 > head(fitted(model1))
         1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

We can also get the fitted values from the linear predictor values by  
using:

LP <- model1$linear.predictors

 > head(exp(LP) / (1 + exp(LP)))
         1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

You can also get the above by using the predict.glm() function with  
type == "response". The default type of "link" will get you the linear  
predictor values as above. predict.glm() would typically be used to  
generate predictions using the model on new data.

 > head(predict(model1, type = "response"))
         1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

In glm.fit(), which is the workhorse function in glm(), the fitted  
values returned in the model object are actually computed by using the  
inverse link function for the family passed to glm():

 > binomial()$linkinv
function (eta)
.Call("logit_linkinv", eta, PACKAGE = "stats")
<environment: 0x171c8b10>

Thus:

 > head(binomial()$linkinv(model1$linear.predictors))
         1         2         3         4         5         6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893

So those are the same values as we saw above using the other methods.  
So, all is consistent across the various methods.

Perhaps the above provides some insights for you into how R does some  
of these things and also to point out, as is frequently the case with  
R, there is more than one way to get the same result.

HTH,

Marc Schwartz