[R] logistic regression model specification

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Nov 13 23:28:00 CET 2007


On Tue, 13 Nov 2007, Dylan Beaudette wrote:

> Hi,
>
> I have setup a simple logistic regression model with the glm() function, with
> the follow formula:
>
> y ~ a + b
>
> where:
> 'a' is a continuous variable stratified by
> the levels of 'b'
>
>
> Looking over the manual for model specification, it seems that coefficients
> for unordered factors are given 'against' the first level of that factor.

Only for the default coding.

> This makes for difficult interpretation when using factor 'b' as a
> stratifying model term.

Really?  You realize that you have not 'stratified' on 'b', which would 
need the model to be a*b?  What you have is a model with parallel linear 
predictors for different levels of 'b', and if the coefficients are not 
telling you what you want you should change the coding.

Much of what I am trying to get across is that you have a lot of choice as 
to how you specify a model to R. There has to be a default, which is 
chosen because it is often a good choice.  It does rely on factors being 
coded well: the 'base level' (to quote ?contr.treatment) needs to be 
interpretable.  And also bear in mind that the default choices of 
statistical software in this area almost all differ (and R's differs from 
S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good 
choice' do differ.

> Setting up the model, minus the intercept term, gives me what appear to be
> more meaningful coefficients. However, I am not sure if I am interpreting the
> results from a linear model without an intercept term. Model predictions from
> both specifications (with and without an intercept term) are nearly identical
> (different by about 1E-16 in probability space).
>
> Are there any gotchas to look out for when removing the intercept term from
> such a model?

It is just a different parametrization of the linear predictor. 
Anything interpretable in terms of the predictions of the model will be 
unchanged.  That is the crux: the default coefficients of 'b' will be 
log odds-ratios that are directly interpretable, and those in the 
per-group coding will be log-odds for a zero value of 'a'. Does a zero 
value of 'a' make sense?

> Any guidance would be greatly appreciated.
>
> Cheers,
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list