[R] glm questions --- saturated model

BXC (Bendix Carstensen) bxc at steno.dk
Tue Mar 16 15:51:50 CET 2004


> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David Firth
> Sent: Tuesday, March 16, 2004 1:12 PM
> To: Paul Johnson
> Cc: r-help at r-project.org
> Subject: Re: [R] glm questions
> 
> 
> Dear Paul
> 
> Here are some attempts at your questions.  I hope it's of some help.
> 
> On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote:
> 
> > Greetings, everybody. Can I ask some glm questions?
> >
> > 1. How do you find out -2*lnL(saturated model)?
> >
> > In the output from glm, I find:
> >
> > Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
> > Residual deviance:   -2[lnL(fitted) - lnL(saturated)]
> >
> > The Null model is the one that includes the constant only 
> (plus offset
> > if specified). Right?
> >
> > I can use the Null and Residual deviance to calculate the 
> "usual model
> > Chi-squared" statistic
> > -2[lnL(null) - lnL(fitted)].
> >
> > But, just for curiosity's sake, what't the saturated model's -2lnL ?
> 
> It's important to remember that lnL is defined only up to an additive 
> constant.  For example a Poisson model has lnL contributions -mu + 
> y*log(mu) + constant, and the constant is arbitrary.  The 
> differencing 
> in the deviance calculation eliminates it.  What constant would you 
> like to use??
> 

I have always been und the impression that the constant chosen by glm is
that which makes the deviance of the saturated model 0, the saturated
model being the one with one parameter per observation in the dataset.

For example:

> y <- sample( 0:10, 15, replace=T )
> A <- factor( rep( 1:5, 3 ) )
> B <- factor( rep( 1:3, each=5 ) )
> data.frame( y, A, B )
    y A B
1   1 1 1
2   4 2 1
3   3 3 1
4   7 4 1
5   1 5 1
6   0 1 2
7   5 2 2
8   8 3 2
9   4 4 2
10  2 5 2
11  6 1 3
12 10 2 3
13  6 3 3
14  0 4 3
15  1 5 3
> glm( y ~ A + B, family=poisson )

Call:  glm(formula = y ~ A + B, family = poisson) 

Coefficients:
(Intercept)           A2           A3           A4           A5
B2  
     0.6581       0.9985       0.8873       0.4520      -0.5596
0.1719  
         B3  
     0.3629  

Degrees of Freedom: 14 Total (i.e. Null);  8 Residual
Null Deviance:      40.33 
Residual Deviance: 24.07        AIC: 78.9 
> glm( y ~ A * B, family=poisson )

Call:  glm(formula = y ~ A * B, family = poisson) 

Coefficients:
(Intercept)           A2           A3           A4           A5
B2  
  2.535e-15    1.386e+00    1.099e+00    1.946e+00   -1.293e-14
-2.330e+01  
         B3        A2:B2        A3:B2        A4:B2        A5:B2
A2:B3  
  1.792e+00    2.353e+01    2.428e+01    2.274e+01    2.400e+01
-8.755e-01  
      A3:B3        A4:B3        A5:B3  
 -1.099e+00   -2.704e+01   -1.792e+00  

Degrees of Freedom: 14 Total (i.e. Null);  0 Residual
Null Deviance:      40.33 
Residual Deviance: 3.033e-10    AIC: 70.84 

----------------------
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2
DK-2820 Gentofte
Denmark
tel: +45 44 43 87 38
mob: +45 30 75 87 38
fax: +45 44 43 07 06
bxc at steno.dk
www.biostat.ku.dk/~bxc




More information about the R-help mailing list