[R] logistic regression model specification

Dylan Beaudette dylan.beaudette at gmail.com
Tue Nov 13 23:53:31 CET 2007


On Tuesday 13 November 2007, Prof Brian Ripley wrote:
> On Tue, 13 Nov 2007, Dylan Beaudette wrote:
> > Hi,
> >
> > I have setup a simple logistic regression model with the glm() function,
> > with the follow formula:
> >
> > y ~ a + b
> >
> > where:
> > 'a' is a continuous variable stratified by
> > the levels of 'b'
> >
> >
> > Looking over the manual for model specification, it seems that
> > coefficients for unordered factors are given 'against' the first level of
> > that factor.

Thanks for the quick reply.

> Only for the default coding.

Indeed, I should have added that to my initial message.

> > This makes for difficult interpretation when using factor 'b' as a
> > stratifying model term.
>
> Really?  You realize that you have not 'stratified' on 'b', which would
> need the model to be a*b?  What you have is a model with parallel linear
> predictors for different levels of 'b', and if the coefficients are not
> telling you what you want you should change the coding.

I should have specified that interpretation was difficult, not because of the 
default behaviour, rather my limitations and the nature of the data. Perhaps 
an example would help.

y ~ a + b

'a' is a continuous predictor (i.e. temperature)
observed on the levels of 'b' (geology)

The form of the model (or at least what I was hoping for) would account for 
the variation in 'y' as predicted by 'a', within each level of 'b' . Am I 
specifying this model incorrectly?

> Much of what I am trying to get across is that you have a lot of choice as
> to how you specify a model to R. There has to be a default, which is
> chosen because it is often a good choice.  It does rely on factors being
> coded well: the 'base level' (to quote ?contr.treatment) needs to be
> interpretable.  And also bear in mind that the default choices of
> statistical software in this area almost all differ (and R's differs from
> S, GLIM, some ways to do this in SAS ...), so people's ideas of a 'good
> choice' do differ.

Understood. I was not implying a level of 'goodness', rather hoping to gain 
some insight into a (possibly) mis-coded model specification.

>
> > Setting up the model, minus the intercept term, gives me what appear to
> > be more meaningful coefficients. However, I am not sure if I am
> > interpreting the results from a linear model without an intercept term.
> > Model predictions from both specifications (with and without an intercept
> > term) are nearly identical (different by about 1E-16 in probability
> > space).
> >
> > Are there any gotchas to look out for when removing the intercept term
> > from such a model?
>
> It is just a different parametrization of the linear predictor.
> Anything interpretable in terms of the predictions of the model will be
> unchanged.  That is the crux: the default coefficients of 'b' will be
> log odds-ratios that are directly interpretable, and those in the
> per-group coding will be log-odds for a zero value of 'a'. Does a zero
> value of 'a' make sense?

In the case of this experiment, a zero-level for 'a' does not make sense.

Further thoughts welcomed.

Cheers,

Dylan


> > Any guidance would be greatly appreciated.
> >
> > Cheers,



-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341



More information about the R-help mailing list