[R] glm: offset

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 3 10:16:41 CET 2008


On Mon, 3 Mar 2008, Ted.Harding at manchester.ac.uk wrote:

> On 03-Mar-08 03:19:01, Wensui Liu wrote:
>> HI, John,
>> my understanding is that you should use log(...) instead of its
>> original scale. Below is the logic in the case of poisson reg.
>> log(y / offset) = x'b
>> => log(y) - log(offset) = x'b
>> => log(y) = x'b + log(offset)
>
> Well, this is where it gets interesting!
> The above statement of the "logic" begs the question (i.e. assumes
> the answer).
>
> I would go according to the general interpretation of "offset"
> in LM and GLM modelling -- an "offset" is
>
>  "a quantitative variable whose regression coefficient
>   is known to be 1"
>  [McCullough and Nelder (1983) "Generalised Linear Models",
>    page 138]

Yes, and that is how it is defined in R too -- see ?offset.

The issue is more what you want to do with the offset.  In a Poisson 
regression, the offset is most often used to include exposure time, the 
Poisson model being for log rate.  Thus

mu = lambda*T, log(lamba) = Xb

means

log(mu) = Xb + log(T)

is the model for Poisson counts of occurrences in time intervals and hence 
the offset is log(T).

As ?offset hints, there are examples under ?glm (taken from MASS) and for 
dataset Insurance in package MASS.  One with non-logged offset and one 
with ....



> Since the GLM for a Poisson regression with log link is to model
>
>  L = log(mu) = a + b1*X1 + B2*X2 + ...
>
> mu is the Poisson mean, and where X1, X2, ... are the raw
> (untransformed, unless you have other reasons for tranforming
> them prior to bringing them into the regression) explanatory
> variables, if X1 is the variable you wish to use as "offset"
> in the above sense then it should be used un-transformed.
> On this basis, the answer to John Sorkin's question should be:
> don't use log(NumUniPt), use NumUniPt.
>
> There's a potential confusion here in that presumably
> "NumUniPt" may be a positive variable whose distribution
> in the data may be skew, i.e. the sort of variable that
> you may feel urged to take the log of before using it.
>
> But that would be an "other reason" in the sense of my
> comment above.
>
> After all, suppose "NumUniPt" denoted a variable in the
> data that could take negative values. Would you be happy
> to use log(NumUniPt) in that case?
>
> Best wishes to all,
> Ted.
>
>
>> On Sun, Mar 2, 2008 at 10:01 PM, John Sorkin
>> <jsorkin at grecc.umaryland.edu> wrote:
>>> R 2.6.0
>>>  Windows XP
>>>
>>>  A question about running a generalized linear model.
>>>
>>>  I am running a glm with
>>>  (1) a poisson distribution and a log link:
>>>    family=poisson(link = "log")
>>>  and an offset.
>>>  I would like to know if I should express the offset as the log of the
>>>  offset value, i.e.
>>>  offset=log(NumUniqPt)
>>>  or as:
>>>  offset=NumUniqPt
>>>
>>>  I suspect I need to use the log, bu t I can't find any discussion of
>>>  this in MASS 1994 or on the man page for glm.
>>>  Thanks
>>>  John
>>>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list