[R] nonlinear regression-getting the explained variation

Douglas Bates bates at stat.wisc.edu
Sun Nov 26 17:23:17 CET 2006


On 11/23/06, inaki at goisolutions.net <inaki at goisolutions.net> wrote:

> I'm trying to teach myself R, and by the way, re-learning statistics using
> Crawley's "Statistics: an introduction using R".
> I've reached the regression chapter, and when it deals with non-linear
> regresion using the nls library I face the following problem:

> I follow the steps---
> >deer<-read.table("c:\\temp\\jaws.txt",header=T)
> ---data available at
> http://www.bio.ic.ac.uk/research/crawley/statistics/data/
> >attach(deer)
> >names(deer)
> >library(nls)
> >model2<-nls(bone~a*(1-exp(-c*age)),start=list(a=120,c=0.064))
> >summary(model2)
> I've taken away those steps that regard plotting.

But I hope you did actually plot the data when you tried this exercise.

Also, you can, if you wish, read the data from the URL without copying
the data to a local file.

> deer <- read.table("http://www.bio.ic.ac.uk/research/crawley/statistics/data/jaws.txt", header = TRUE)
> model2<-nls(bone~a*(1-exp(-c*age)),deer, start=list(a=120,c=0.064))
> summary(model2)

Formula: bone ~ a * (1 - exp(-c * age))

Parameters:
   Estimate Std. Error t value Pr(>|t|)
a 115.58056    2.84365  40.645  < 2e-16
c   0.11882    0.01233   9.635 3.69e-13

Residual standard error: 13.1 on 52 degrees of freedom

If you want to make things even easier you could use the self-starting
model SSasympOrig as

> summary(fm1 <- nls(bone ~ SSasympOrig(age, Asym, lrc), deer))

Formula: bone ~ SSasympOrig(age, Asym, lrc)

Parameters:
     Estimate Std. Error t value Pr(>|t|)
Asym 115.5805     2.8436   40.65   <2e-16
lrc   -2.1302     0.1038  -20.52   <2e-16

Residual standard error: 13.1 on 52 degrees of freedom

The parameter lrc for that model is the natural logarithm of the rate
constant 'c'.

> And everything goes fine, I understand the steps taken and so, but on the
> text Crawley says "the model... and explained 84.6% of the total variation
> in bone lenght".
> I guess this is linked to the adjusted squared R for linear models, but I
> just can't find how to get it in this case...

Actually it's just an R-squared, not an adjusted R-squared.  It is
being calculated as 1 - RSS/AdjSS where RSS is the residual sum of
squares from the fitted model and AdjSS is the "adjusted sum of
squares" or the sum of squares of the deviations of the observed
responses from their mean. In this case the values are

> sum(resid(model2)^2)   # RSS
[1] 8929.143
>  with(deer, sum((bone - mean(bone))^2))   # AdjSS
[1] 59007.99

so the value of R^2 from the formula is

> with(deer, 1 - sum(resid(model2)^2)/sum((bone - mean(bone))^2))
[1] 0.848679

> I've tried in Statgraphics, and it plots the anova table and r^2 right the
> way, how could I do so in R?

Yes, many software packages that fit nonlinear regression models do
produce an anova table and an R^2 value for any model, even when that
table and the R^2 value do not apply. (I'm assuming that the anova
table is based on dividing the adjusted sum of squares into "model"
and "residual" components so it is essentially the same calculation as
above.)

It would not be difficult to include these values in the summary
output from an nls model in R but we don't because they could be
nonsense.  To see why we must examine what the R^2 should represent.
First you should read what Bill Venables has to say on the general
subject of analysis of variance for linear models.  Use

install.packages("fortunes"); library(fortunes); fortune("curious")

All the summaries like an anova table or an R^2 value are based on the
comparison of two model fits - the model we just fit to the data and
the corresponding "trivial" model. The trivial model is either an
arbitrary constant or zero, depending on whether the model consisting
of a constant only can be embedded in the model we have fit.  If we
can select values of the parameters that turn our model into a
one-parameter model consisting of a constant then the comparison sum
of squares is AdjSS as above because the parameter estimate in the
trivial model is mean(response).  If we can't embed the constant model
in our model then the appropriate comparison sum of squares is the
unadjusted sum of squares

sum(response^2)

Technically we can't embed the model for which the predictions are an
arbitrary constant in this model because of that point at age == 0.
No matter how we change the parameters 'a' and 'c' in
a*(1-exp(-c*age)) we always predict zero at age == 0.  Thus the only
model with constant predictions that is embedded in this model is the
model that predicts 0 for all ages so we should use the unadjusted sum
of squares.  However, if we ignore the point at age == 0 (which
doesn't contribute any information to the model fit) then we can get a
constant model by letting c go to +Inf.  As c goes to +Inf the
conditional estimate of a goes to mean(bone) so the constant model is
embedded in the model we have fit.

We don't include the R^2 or the anova table in the summary output for
a nonlinear regression model because we can't tell if these are the
appropriate formulas.  Look at the model in

?SSfol

That's a common nonlinear regression model in pharmacokinetics and it
can't be reduced to a constant other than zero.  A person with good
calculus skills can discern that but I certainly wouldn't want to
write a computer program to decide whether a given nonlinear model can
be reduced to a nonzero constant.

I don't regard the fact that other programs will produce summary
quantities whether or not they apply to be an advantage.  Such results
only propagate the misconception that statistical modeling is about a
collection of formulas.  It's not.  It's about models and you can't
understand the modeling process if you don't understand the models.

> Thanks in advance, and sorry to bother

You're welcome.  Such questions are not a bother.  You were very clear
about what you are trying to determine and you provided a reproducible
example.

> Iñaki
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list