[R] logLik.lm()

Fri Jun 27 22:17:02 CEST 2003

Hi:

This is not a typical R posting, but I was quite surprised to read 
Prof. Ripley's comment about the inappropriate use of AIC to 
compare "non-nested" models. As he says, While it is indeed true that 
Akaike's (1973) develops AIC for nested models, i.e. models which can 
be obtained by various restrictions on parameters, it is not at all 
obvious to me that it can't be used for non-nested cases. 

To quote Stone (1977, JRSS B): "Akaike's derivation of AIC was for 
heirarchical models but, as he finally remarked, this restriction is 
unnecessary."  I don't know where Akaike made this remark - I couldn't 
see it in his 1973 paper - but AIC has indeed been used in various 
situations where the models are non-nested. From the motivation of AIC 
as an unbiased estimator of the Kullback-Leibler divergence of asssumed 
model from the "true" model, it is not clear that the models have to be 
nested. 

Any thoughts or comments on this issue?

Best,
Ravi.

----- Original Message -----
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Date: Wednesday, June 25, 2003 2:59 pm
Subject: Re: [R] logLik.lm()

> Your by-hand calculation is wrong -- you have to use the MLE of 
> sigma^2.
> sum(dnorm(y, y.hat, sigma * sqrt(16/18), log=TRUE))
> 
> Also, this is an inappropriate use of AIC: the models are not 
> nested, and
> Akaike only proposed it for nested models.  Next, the gamma GLM is 
> not a
> maximum-likelihood fit unless the shape parameter is known, so you 
> can'tuse AIC with such a model using the dispersion estimate of shape
> 
> The AIC output from glm() is incorrect (even in that case, since the
> shape is always estimated by the dispersion).
> 
> On Wed, 25 Jun 2003, Edward Dick wrote:
> 
> > Hello,
> > 
> > I'm trying to use AIC to choose between 2 models with
> > positive, continuous response variables and different error
> > distributions (specifically a Gamma GLM with log link and a
> > normal linear model for log(y)). I understand that in some
> > cases it may not be possible (or necessary) to discriminate
> > between these two distributions. However, for the normal
> > linear model I noticed a discrepancy between the output of
> > the AIC() function and my calculations done "by hand."
> > This is due to the output from the function logLik.lm(),
> > which does not match my results using the dnorm() function
> > (see simple regression example below).
> > 
> > x <- c(43.22,41.11,76.97,77.67,124.77,110.71,144.46,188.05,171.18,
> >        
> 204.92,221.09,178.21,224.61,286.47,249.92,313.19,332.17,374.35)> y 
> <- c(5.18,12.47,15.65,23.42,27.07,34.84,31.03,30.87,40.07,57.36,
> >        47.68,43.40,51.81,55.77,62.59,66.56,74.65,73.54)
> > test.lm <- lm(y~x)
> > y.hat <- fitted(test.lm)
> > sigma <- summary(test.lm)$sigma
> > logLik(test.lm)
> > # `log Lik.' -57.20699 (df=3)
> > sum(dnorm(y, y.hat, sigma, log=T))
> > # [1] -57.26704
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>