[R] Nonlinear statistical modeling -- a comparison of R and AD Model Builder

Mon Nov 27 00:41:27 CET 2006

Douglas Bates wrote:

>
     snip
>
> Don't you find it somewhat disingenuous that you publish a comparison
> between the AD Model Builder software that you sell and R - a
> comparison that shows a tremendous advantage for your software - and
> then you write "I am not proficient in R"?
>
I think there is a misunderstanding here. I did not pick this example.
As I said it was undertaken by Schnute and his colleagues. I had nothing
to do with it except of course to sell them my software. However as I
stated at that time Splus could not run the model without crashing after
a time so that no comparison was possible. I was aware of those results
and decided to use R to complete the comparison. I don't see why Schnute
would want to unfairly promote my software. I believe he was simply
looking for the best tool for the job he had in mind.

> Had you been proficient in R you might have known about the symbolic
> differentiation capabilities, specifically the deriv function, that

> have been part of the S language since the late 1980s.  I believe that
> the 'AD' in "AD model builder" stands for automatic differentiation,
> which is actually something that John Chambers and I discussed at
> length when we were developing nonlinear modeling methods for S.  In
> the end we went with symbolic differentiation rather than automatic
> differentiation because we felt that symbolic was more flexible.
>
Yes I am aware of the symbolic differentiation capabilities.
I have checked out the deriv function and it does not seem to be capable 
of calculating derivatives for a model of this complexity in an
efficient manner. Of course I could be wrong.

There is a paper by
Andreas Griewank (whose title I have forgotten but perhaps some list
member recalls) around 1990 where he compares symbolic and automatic
differentiation for a simple model of an oil reservoir. He demonstrates
quite decisively that symbolic differentiation is not the way to go.

> This is not to say that automatic differentiation isn't a perfectly
> legitimate technique.  However, my recollection is that it would have
> required extensive revisions to the arithmetic expression evaluator,
> which is already very tricky code because of the "recycling rule" and
> the desire to shield users from knowledge of the internal
> representations and such details as whether you are using logical or
> integer or double precision operands or a combination.
>
> If you want to see these details you can, of course, examine the
> source code.  I don't believe we would have the opportunity to examine
> how you implemented automatic differentiation.
>
> I must also agree with Spencer Graves that when I start reading a
> description of a nonlinear model with over 100 parameters, the example
> that you chose, I immediately start thinking of nonlinear mixed
> effects models.  In my experience the only way in which a nonlinear
> model ends up with that number of parameters is through applying an
> underlying model with a low number of parameters to various groups
> within the data.  Table 2 in the Schnute et al. paper to which you
> make reference states that the number of parameters in the model is T
> + A + 5 where T is the number of years of data and A is the number of
> age classes.  To me that looks a lot like a nonlinear mixed effects
> model.
>
I agree that this makes a good nonlinear random effects example. Of
course 10 years ago AD Model Builder did not have that capability. It
now does and my colleague Hans Skaug has modified the code to
incorporate random effects. I believe the model converges in a few
minutes. He will report the results and hopefully they can be compared
to nlme or any other software in R which can carry out the calculations.

> Also, your choice of subject heading for your message seems
> deliberatively provocative.  You seem to be implying that you are
> discussing a comparisons of AD Model Builder and R on all aspects of
> nonlinear statistical modeling but you are only discussing one
> comparison on simulated data using a model from the applications area
> for which you wrote AD Model Builder.  Then you follow up by saying "I
> am not proficient in R" and your results for R are from applying code
> that someone else gave you.
>
> It seems that ADMB had a bit of a "home-field advantage" in this
> particular comparison.
I don't quite get your point. Of course I am only going to present 
examples where I believe ADMB is (far) superior to R. Otherwise I would 
just be wasting everyones time. ADMB is much more narrowly focused than
R. I think that people can examine the arguments and make up their own
minds.
>
> I view nonlinear statistical modeling differently.  I have had a bit
> of experience in the area and I find that I want to examine the data
> carefully, usually through plots, before I embark on fitting
> complicated models.  I like to have some assurance that the model
> makes sense in the context of the data.  (In your example you don't
> need to worry about appropriateness of the model because the data were
> simulated.) I would never try to fit a nonlinear model with 100
> parameters to data without carefully examining the data, and
> especially selected subsets of the data, first.  For this the
> flexibility of the S language and tools like lattice graphics that
> were developed in this language are invaluable to me.  The flexibility
> of data manipulation and graphics for interactive exploration of data
> is what attracted me to S in the first place.
>
> I realize that for many people the area of nonlinear statistical
> modeling is reduced to "Fit this model to these data and don't ask any
> questions.   Just give me parameter estimates and p-values."  If that
> is your situation then it would make sense to use software that gets
> you those estimates as quickly as possible with a minimum of effort.
> I'm just happy that I get to turn down people who ask me to do that.
> I like that fact that I can spend my time asking questions about the
> data and of the data.
>
    snip

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

Douglas Bates wrote:
> On 11/24/06, dave fournier <otter at otter-rsch.com> wrote:
>>
>>
>>        Dave
>>  > Did you try supplying gradient information to nlminb?  (I note that
>> nlminb is used for the optimization, but I don't see any gradient
>> information supplied to it.) I would suspect that supplying gradient
>> information would greatly speed up the computation (as you note in
>> comments at http://otter-rsch.ca/tresults.htm.)
> 
>> Actually you should probably ask Norm Olsen these questions.
>> I am not proficient in R and am just using his code.
> 
> Don't you find it somewhat disingenuous that you publish a comparison
> between the AD Model Builder software that you sell and R - a
> comparison that shows a tremendous advantage for your software - and
> then you write "I am not proficient in R"?
> 
> Had you been proficient in R you might have known about the symbolic
> differentiation capabilities, specifically the deriv function, that
> have been part of the S language since the late 1980s.  I believe that
> the 'AD' in "AD model builder" stands for automatic differentiation,
> which is actually something that John Chambers and I discussed at
> length when we were developing nonlinear modeling methods for S.  In
> the end we went with symbolic differentiation rather than automatic
> differentiation because we felt that symbolic was more flexible.
> 
> This is not to say that automatic differentiation isn't a perfectly
> legitimate technique.  However, my recollection is that it would have
> required extensive revisions to the arithmetic expression evaluator,
> which is already very tricky code because of the "recycling rule" and
> the desire to shield users from knowledge of the internal
> representations and such details as whether you are using logical or
> integer or double precision operands or a combination.
> 
> If you want to see these details you can, of course, examine the
> source code.  I don't believe we would have the opportunity to examine
> how you implemented automatic differentiation.
> 
> I must also agree with Spencer Graves that when I start reading a
> description of a nonlinear model with over 100 parameters, the example
> that you chose, I immediately start thinking of nonlinear mixed
> effects models.  In my experience the only way in which a nonlinear
> model ends up with that number of parameters is through applying an
> underlying model with a low number of parameters to various groups
> within the data.  Table 2 in the Schnute et al. paper to which you
> make reference states that the number of parameters in the model is T
> + A + 5 where T is the number of years of data and A is the number of
> age classes.  To me that looks a lot like a nonlinear mixed effects
> model.
> 
> Also, your choice of subject heading for your message seems
> deliberatively provocative.  You seem to be implying that you are
> discussing a comparisons of AD Model Builder and R on all aspects of
> nonlinear statistical modeling but you are only discussing one
> comparison on simulated data using a model from the applications area
> for which you wrote AD Model Builder.  Then you follow up by saying "I
> am not proficient in R" and your results for R are from applying code
> that someone else gave you.
> 
> It seems that ADMB had a bit of a "home-field advantage" in this
> particular comparison.
> 
> I view nonlinear statistical modeling differently.  I have had a bit
> of experience in the area and I find that I want to examine the data
> carefully, usually through plots, before I embark on fitting
> complicated models.  I like to have some assurance that the model
> makes sense in the context of the data.  (In your example you don't
> need to worry about appropriateness of the model because the data were
> simulated.) I would never try to fit a nonlinear model with 100
> parameters to data without carefully examining the data, and
> especially selected subsets of the data, first.  For this the
> flexibility of the S language and tools like lattice graphics that
> were developed in this language are invaluable to me.  The flexibility
> of data manipulation and graphics for interactive exploration of data
> is what attracted me to S in the first place.
> 
> I realize that for many people the area of nonlinear statistical
> modeling is reduced to "Fit this model to these data and don't ask any
> questions.   Just give me parameter estimates and p-values."  If that
> is your situation then it would make sense to use software that gets
> you those estimates as quickly as possible with a minimum of effort.
> I'm just happy that I get to turn down people who ask me to do that.
> I like that fact that I can spend my time asking questions about the
> data and of the data.
> 
> 
>> However I can say that providing derivatives for such a model is a
>> highly nontrivial exercise. As I said in my posting, the  R script and
>> data are available to anyone who feels that the exercise was not carried
>> out properly and would like to improve on it. Also one does not need
>> to provide derivatives to the AD Model Builder program.
>>
>> Finally suppose that you are very good at calculating derivatives and
>> manage to get them right. Then someone else comes along who wants to
>> modify the model. Unless they are also very good at calculating
>> derivatives there will be trouble.
>>
>>  >
>>  > I'm curious -- when you say "R may not be a suitable platform for
>> development for such models", what aspect of R do you feel is lacking?
>> Is it the specific optimization routines available, or is it some other
>> more general aspect?
>>
>> 2 seconds vs 90 minutes. For a real problem of tihs type the timings
>> would probably be something like 10 minutes vs more than 2,700 minutes.
>>
>>  >
>>  > Also, another optimization algorithm available in R is the "L-BFGS-B"
>> method for optim() in the MASS package.  I've had extremely good
>> experiences with using this code in S-PLUS.  It can take box
>> constraints, and can use gradient information.  It is my first choice
>> for most optimization problems, and I believe it is very widely used.
>> Did you try using that optimization routine with this problem?
>>  >
>>  > -- Tony Plate
>>  >
>>  > dave fournier wrote:
>>  >> There has recently been some discussion on the list about
>>  >> AD Model builder and the suitability of R for constructing the
>>  >> types of models used in fisheries management.
>>  >>
>>  >>    https://stat.ethz.ch/pipermail/r-help/2006-January/086841.html
>>  >>
>>  >>    https://stat.ethz.ch/pipermail/r-help/2006-January/086858.html
>>  >>
>>  >> I  think that many R users understimate the numerical challenges
>>  >> that some of the typical nonlinear statistical model used in 
>> different
>>  >> fields present. R may not be a suitable platform for development for
>>  >> such models.
>>  >>
>>  >> Around 10 years ago John Schnute, Laura Richards, and Norm Olsen
>>  >> with Canadian federal fisheries undertook an investigation
>>  >> comparing various statistical modeling packages for a simple
>>  >> age-structured statistical model of the type commonly used in
>>  >> fisheries. They compared AD Mdel Builder, Gauss, Matlab, and
>>  >> Splus. Unfortunately a working model could not be produced with Splus
>>  >> so its times could not be included in the comparison. It is possible
>>  >> to produce a working model with the present day version of R so that
>>  >> R can now be directly compared with AD Model Builder for this type
>> of model.
>>  >>
>>  >> I have put the results of the test together with the original
>>  >> Schnute and Richards paper and the working R and AD Model Builder
>>  >> codes on Otter's web site
>>  >>
>>  >>      http://otter-rsch.ca/tresults.htm
>>  >>
>>  >> The results are that AD Model builder is roughly 1000 times faster 
>> than
>>  >> R for this problem. ADMB takes about 2 seconds to converge while
>>  >> R takes over 90 minutes.
>>  >>
>>  >> This is a simple toy example. Real fisheries models are often 
>> hundred of
>>  >> times more computationally intensive as this one.
>>  >>
>>  >>         Cheers,
>>  >>
>>  >>          Dave
>>  >> ~
>>  >
>>
>>
>> -- 
>> David A. Fournier
>> P.O. Box 2040,
>> Sidney, B.C. V8l 3S3
>> Canada
>> Phone/FAX 250-655-3364
>> http://otter-rsch.com
>>
>> -- 
>> David A. Fournier
>> P.O. Box 2040,
>> Sidney, B.C. V8l 3S3
>> Canada
>> Phone/FAX 250-655-3364
>> http://otter-rsch.com
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 

-- 
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com