[R] comparing AIC values of models with transformed, untransformed, and weighted variables

Mon Mar 27 17:43:46 CEST 2006

Two comments:

1) The log-likelihood and hence AIC for a model for log X are not 
comparable with those of a model for X.  You need to make an additive 
adjustment when you transform: it is quite easy to work out what from the 
definitions.

2) The AIC given by glm() for weighted models was wrong in R < 2.3.0 
alpha.  I am not sure why you are using a glm for what appears to be a 
least-squares fit: use lm() instead (or try 2.3.0 alpha).

On Wed, 15 Mar 2006, Patrick Baker wrote:

> Hi there, I have a question regarding model comparisons that seems simple 
> enough but to which I cannot find an answer. I am interested in developing a 
> predictive model relating some measure of a tree's stem to the total leaf 
> area (TLA) of the tree. Predictor variables might include, for example, the 
> total cross-sectional area of the tree (commonly referred to as basal area) 
> or the amount of sapwood area (SA) (which represents the amount of wood 
> involved in active transport of water up the tree to the leaves). A variety 
> of people have developed these models for a variety of tree species in a 
> variety of places around the world. Perhaps not surprisingly, different 
> studies have used different model forms in analyzing their data. I am 
> interested in comparing the range of models that have been previously used 
> (some of which are theoretically derived, others of which are empirically 
> driven) using a data set that I have collected (for yet another species in 
> yet another place). To compare the different model forms I had intended to 
> use the AIC. However, I have found, again perhaps not surprisingly, that when 
> I use log-transformed data, the AIC is substantially lower for a given 
> predictor variable. If I use a weighted glm the same issue arises. For 
> example, using BA vs TLA the (rounded) AIC values are  275 for a linear 
> model, 30 for a log-log model, and 8 for a glm weighted by 1/BA. I don't 
> believe that these vast differences reflect a major improvement in the model, 
> but rather the scaling of the variables by transformation or weighting. What 
> I'd like to get some advice or insight on is whether there is an appropriate 
> way to rescale the AIC values to permit  comparisons across these models. Any 
> suggestions would be very welcome. Cheers, Patrick Baker
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595