[R] fitting a gompertz model through the origin using nls

Fri Mar 6 14:39:43 CET 2009

On 06-Mar-09 11:33:21, Swantje Löbel wrote:
> Dear all!
> I tried to fit Gompertz growth models to describe cummulative
> germination rates using nls. I used the following code:
> 
> germ.model<-nls(percent.germ~a*exp(-b*exp(-k*day)),data=tab,
>                 start=list(a> =100,b=10,k=0.5))
> 
> My problem is that I want that the fitted model goes through the
> origin, since germination cannot start before the experiment was
> started, and y-max should be 100.
> 
> Does anyone know how I can achieve this?
> 
> Thanks a lot in advance! Swantje

Given your observation that germination cannot start before the
experiment has started, it is clear that a Gompertz growth curve
model is unrealistic for your experiment, at least to that extent,
since it is impossible for the Gompertz function to take the
value 0 for any positive (or zero) time.

So you should certainly be asking why you wanted to use that model
in the first place.

That being said, you may wish to try developing an alternative
model along the following lines.

[A]
Suppose (to start with) that, under given experimental conditions,
your seeds will germinate at times which are independent of each
other, and randomly according to the "condition" of the seed and
local variations of the condition of the soil surrounding the seed.

Then the proportion of such seeds which would have germinated by
time 'day' is F(day), where F is the cumulative distribution
function of the distribution of germination time of such seeds
in such conditions. The choice of a possible distribution function
will have to be a valid representation of real-life constraints,
so that F(day) = 0 for 'day' <= 0, and no doubt F(day) = 1 for
'day' > Day.max, a time by which any seed which is going to
germinate will have germinated.

Which raises a further possibility: that an (initially) unknown
proportion of seeds may never germinate. So call this (1-P)
where 'P' is the proportion that will germinate. You may be
confident that you can take P=1. Or not, as the case may be.

Then the "growth curve" will be of the form P*F(day). If you take
P=1, than this automatically has value 0 when day=0, and value 1
when day=Day.max.

As to suggestions for good analyttical forms for F(day), I'm
not going to pretend that I know enough about "seed biology"
to make realistic recommendations. But one might consider fairly
simple functions, readily available in R, which can be adapted
to the kind of scenario suggested above.

For example, suppose that the germination time over the interval
(0,Day.max) is a beta distribution of the form

  (u^(a-1))*((1-u)^(b-1))/B(a,b)

where u = day/Day.max (and B(a,b) is the normalising constant).

Then F(day) is the cumulative distribution of this, available
in R as pbeta(u,a,b). This corresponds to a germination rate
dbeta(u,a,b).

So, with this choice, the function you would fit would  be

  P*pbeta((day/Day.max),a,b)

where you have to fit P, Day.max, a, b. However, the "beta"
suggestion is only off the "top of my head", and some other
choice may be better in terms of biological reality.

At least it has the property of being able to represent a variety
of possible germination behaviours, e.g. for a=1 and b>1, the
germination rate would be highest at day=0, decreasing to 0 as
day approaches Day.max. If a > 1 and b > 1, then germination
rate is zero right at the start, increases to a maximum at
day = Day.max*(a-1)/(a+b-2), and then decreases to zero at
day=Day.max. Similarly, if a>1 and b=1, then the rate is initially
zero, and rises to a maximum at day=Day.max

[B]
A more complicated (and perhaps more realistic) scenario might
require consideration of competition between seeds trying to
germinate, and seed which have already germinated (where the
infant plants may be sucking up resources wouch could have
induced ungerminated seeds to germinate). In that case, the
assumpation underlying approach [A], that the seeds germinate
independently of each other, no longer holds, and the germination
rate at 'day' would depend on the numbers which have already
germinated (and possibly on other factors, such as the locations
of seeds which have germinated, or the lengths of time since
they have germinated since this will be related to the sizes
of those infant plants). This is beginning to move back into
Gompertz territory, since the underlying rationale of the
Gompertz growth curve is that growth rate decreases with
population size because of reduced resources. But it should
be modelled in a different way.

Again, I don't know what would be realistic in "seed biology",
but an approach which would incorporate such considerations
could be:

Other things being equal, a seed has constant "hazard rate"
'a' of germinating, but this this decreases proportionately to
the number n of seeds already germinated (1 - b*n).[*] Then

  (dn/dt) = a*(1-b*n)

so

  n = (1 - exp(-a*b*t))/b

But this does not enforce 100% by Day.max, so you could "fudge"
that in by dividing by a suitable factor:

  n = ((1 - exp(-a*b*t))/b/((1 - exp(-a*b*Day.max))/b)

which again could be fitted using nls, with paramaters a and b
to be fitted (assuming Day.max given).
b[*] One could generalise this "law" to, e.g. a "power law":

  (dn/dt) = a*((1-b*n)^c)

whose solution (satisfying n=0 at t=0) is

  n = (1 - 1/((1 - a*b*(c-1)*t)^(1/(c-1))))/b

(which can possibly be usefully simplified!)

Hoping this helps -- and that maybe people with more knowledge
of these things can contribute more directly relevant suggestions.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Mar-09                                       Time: 13:39:39
------------------------------ XFMail ------------------------------