[R] offset and poisson regression

Renaud Lancelot renaud.lancelot at gmail.com
Mon Jul 27 17:46:48 CEST 2009


You should use offset(log(Gpc)) instead of offset(Gpc)

> options(width = 65)
> fm <- glm(NPe ~ 1 + offset(log(GPc)), family = poisson,data = tab)
> fitted(fm)
       1        2        3        4        6        7        8
3.181818 3.818182 3.818182 4.454545 3.181818 3.818182 3.181818
       9       12       13       14       15       16       18
3.818182 3.181818 3.181818 3.818182 3.818182 3.818182 2.545455
      20       21       22       23       24       27       28
1.909091 3.181818 3.818182 1.909091 3.181818 3.818182 3.181818
      29       30       31       32       33       34       35
3.181818 3.181818 2.545455 3.181818 3.818182 3.181818 3.818182
      36       38       39       40       41       42       44
3.181818 4.454545 3.818182 2.545455 4.454545 5.090909 4.454545
      45       46       48       51       52       54       58
4.454545 4.454545 3.181818 4.454545 3.818182 3.181818 3.818182
      59       62       64       68       71       75       76
5.090909 4.454545 3.818182 3.181818 4.454545 3.818182 3.818182
      81       82       83       85       86       87       88
4.454545 3.818182 3.818182 3.818182 4.454545 3.181818 3.181818
      89       90       91       92       93       94       95
5.090909 5.090909 4.454545 4.454545 3.818182 4.454545 3.818182
      96       98       99      100      101      102      103
3.181818 4.454545 5.090909 3.818182 4.454545 4.454545 3.181818
     104      106      107      108      109      110      111
4.454545 3.181818 5.727273 3.181818 2.545455 4.454545 3.818182
     112      113      114      115      116      117      118
3.818182 3.181818 5.090909 3.181818 4.454545 3.818182 4.454545
     119      121      122      124      125      126      127
4.454545 3.818182 4.454545 5.090909 4.454545 4.454545 3.818182

All the best,

Renaud

2009/7/27 Renaud Scheifler <renaud.scheifler at univ-fcomte.fr>:
> Not sure that the list is the best place for this question, but we are going
> mad with this... We are trying to fit a poisson regression to count data, eg
> the number of fledged youngs of blue tits (NPe) as a function of the clutch
> size (GPc) and other environment variables. Here are the original data
> (dumped) (we just omit the environment variables to simplify):
>
> tab<-
> structure(list(NPe = c(3L, 5L, 2L, 6L, NA, 4L, 4L, 4L, 3L, NA,
> NA, 4L, 5L, 2L, 0L, 5L, NA, 1L, NA, 2L, 5L, 4L, 0L, 4L, NA, NA,
> 6L, 4L, 0L, 4L, 4L, 0L, 6L, 5L, 6L, 3L, NA, 6L, 5L, 3L, 6L, 7L,
> NA, 7L, 6L, 4L, NA, 1L, NA, NA, 7L, 6L, NA, 5L, NA, NA, NA, 0L,
> 0L, NA, NA, 5L, NA, 3L, NA, NA, NA, 5L, NA, NA, 6L, NA, NA, NA,
> 0L, 6L, NA, NA, NA, NA, 5L, 5L, 4L, NA, 4L, 0L, 4L, 5L, 5L, 4L,
> 0L, 0L, 5L, 6L, 5L, 1L, NA, 0L, 7L, 0L, 0L, 3L, 3L, 7L, NA, 0L,
> 6L, 4L, 4L, 5L, 0L, 5L, 4L, 7L, 4L, 7L, 5L, 5L, 0L, NA, 5L, 7L,
> NA, 8L, 7L, 5L, 0L), GPc = c(5L, 6L, 6L, 7L, NA, 5L, 6L, 5L,
> 6L, 6L, 4L, 5L, 5L, 6L, 6L, 6L, 4L, 4L, 4L, 3L, 5L, 6L, 3L, 5L,
> 5L, 7L, 6L, 5L, 5L, 5L, 4L, 5L, 6L, 5L, 6L, 5L, 5L, 7L, 6L, 4L,
> 7L, 8L, 9L, 7L, 7L, 7L, 4L, 5L, 5L, 4L, 7L, 6L, 5L, 5L, 6L, 2L,
> 7L, 6L, 8L, NA, NA, 7L, 6L, 6L, NA, 6L, 6L, 5L, 5L, 5L, 7L, 7L,
> 6L, 6L, 6L, 6L, 7L, 5L, 5L, 7L, 7L, 6L, 6L, 8L, 6L, 7L, 5L, 5L,
> 8L, 8L, 7L, 7L, 6L, 7L, 6L, 5L, 6L, 7L, 8L, 6L, 7L, 7L, 5L, 7L,
> 6L, 5L, 9L, 5L, 4L, 7L, 6L, 6L, 5L, 8L, 5L, 7L, 6L, 7L, 7L, 7L,
> 6L, 7L, 5L, 8L, 7L, 7L, 6L)), .Names = c("NPe", "GPc"), class =
> "data.frame", row.names = c(NA,
> -127L))
>
> It seems logical to insert "clutch size" as an offset term, since we are
> actually interested in the ratio fledged youngs/clutch size. However, the
> final results are quite surprising:
>
> modsr0<-glm(NPe~offset(GPc),family="poisson",data=tab)
>
> if we compute the predictions, we get numbers which looks like a gross
> overestimation of the reality (eg 14.6, 39.7, etc...) -including the fact
> that it implies that one can have more fledged youngs than eggs !:
>
> [1]  0.7  2.0  2.0  5.4  0.7  2.0  0.7  2.0  0.7  0.7  2.0  2.0  2.0  0.3
>  0.1  0.7  2.0
> [18]  0.1  0.7  2.0  0.7  0.7  0.7  0.3  0.7  2.0  0.7  2.0  0.7  5.4  2.0
>  0.3  5.4 14.6
> [35]  5.4  5.4  5.4  0.7  5.4  2.0  0.7  2.0 14.6  5.4  2.0  0.7  5.4  2.0
>  2.0  5.4  2.0
> [52]  2.0  2.0  5.4  0.7  0.7 14.6 14.6  5.4  5.4  2.0  5.4  2.0  0.7  5.4
> 14.6  2.0  5.4
> [69]  5.4  0.7  5.4  0.7 39.7  0.7  0.3  5.4  2.0  2.0  0.7 14.6  0.7  5.4
>  2.0  5.4  5.4
> [86]  2.0  5.4 14.6  5.4  5.4  2.0
>
> Otherwise, if clutch size is inserted as a variable (and not as an offset),
> predictions are much more realistic, with no extreme values :
>
> modsr0<-glm(NPe~GPc,family="poisson",data=tab)
> round(exp(predict(modsr0)),1)
> [1] 3.2 3.7 3.7 4.4 3.2 3.7 3.2 3.7 3.2 3.2 3.7 3.7 3.7 2.7 2.2 3.2 3.7 2.2
> 3.2 3.7 3.2 3.2
> [23] 3.2 2.7 3.2 3.7 3.2 3.7 3.2 4.4 3.7 2.7 4.4 5.3 4.4 4.4 4.4 3.2 4.4 3.7
> 3.2 3.7 5.3 4.4
> [45] 3.7 3.2 4.4 3.7 3.7 4.4 3.7 3.7 3.7 4.4 3.2 3.2 5.3 5.3 4.4 4.4 3.7 4.4
> 3.7 3.2 4.4 5.3
> [67] 3.7 4.4 4.4 3.2 4.4 3.2 6.2 3.2 2.7 4.4 3.7 3.7 3.2 5.3 3.2 4.4 3.7 4.4
> 4.4 3.7 4.4 5.3
> [89] 4.4 4.4 3.7
>
> Can any sound statistician provide a hint about what to do or how to
> interprete this ?
>
> Thanks in advance,
>
> Renaud and Patrick
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Renaud Lancelot
EDEN Project, coordinator
http://www.eden-fp6project.net/
<< EDEN International Conference, Montpellier,  10-12 May 2010 >>
<<   http://international-conference2010.eden-fp6project.net/  >>

UMR CIRAD-INRA "Contrôle des maladies animales exotiques et émergentes"
Joint research unit "Control of emerging and exotic animal diseases"

CIRAD, Campus International de Baillarguet TA A-DIR / B
F34398 Montpellier
http://www.cirad.fr  http://bluetongue.cirad.fr/

Tel.  +33 4 67 59 37 17  -  Fax  +33 4 67 59 37 95
Secr. +33 4 67 59 37 37  - Cell. +33 6 77 52 08 69




More information about the R-help mailing list