[R] strange `nls' behaviour

Tue Nov 13 12:01:41 CET 2007

On Mon, Nov 12, 2007 at 06:59:56PM -0500, Duncan Murdoch wrote:
> On 12/11/2007 2:56 PM, Joerg van den Hoff wrote:
> >On Mon, Nov 12, 2007 at 11:09:21AM -0500, Duncan Murdoch wrote:
> >>On 11/12/2007 9:14 AM, Joerg van den Hoff wrote:
> >>>On Mon, Nov 12, 2007 at 07:36:34AM -0500, Duncan Murdoch wrote:
> >>>>On 11/12/2007 6:51 AM, Joerg van den Hoff wrote:
> >>>>>I initially thought, this should better be posted to r-devel
> >>>>>but alas! no response. 
> >>>>I think the reason there was no response is that your example is too 
> >>>>complicated.  You're doing a lot of strange things (fitfunc as a result 
> >>>>of deriv, using as.name, as.call, as.formula, etc.)  You should 
> >>>>simplify 
> >>>thanks for the feedback.
> >>>
> >>>concerning  "lot  of  strange  things":  OK.  I  thought the
> >>>context might be important ("why, for heaven's sake  do  you
> >>>want  to  do  this!?"), but, then, maybe not. so the easiest
> >>>way to trigger a similar (not the  identical)  behaviour  is
> >>>something like
> >>>
> >>>f <- function (n) {
> >>>  #---------------------------------------------------------
> >>>  #define n data points for a (hardcoded) model:
> >>>  #-----------
> >>>  x <- seq(0, 5, length  = n)
> >>>  y <- 2 * exp(-1*x) + 2; 
> >>>  y <- rnorm(y,y, 0.01*y)
> >>>
> >>>  #the model (same as the above hardcoded one):
> >>>  model <- y ~ a * exp (-b*x) + c
> >>>
> >>>  #call nls as usual:
> >>>  res1 <- try(nls(model, start = c(a=2, b=1, c=2)))
> >>>
> >>>  #call it a bit differently:
> >>>  res2 <- nls(y ~ eval(model[[3]]), start = c(a=2, b=1, c=2))
> >>>
> >>>  list(res1 = res1, res2 = res2)
> >>>  #---------------------------------------------------------
> >>>}
> >>I'd say the problem is relying on the default for the envir parameter to 
> >>eval.  It's reasonable to expect nls to set things up so that terms in 
> >>the model formula are taken from the right place, but when your model 
> >>formula is playing with evaluation, you should be very careful to make 
> >>sure the evaluation takes place in the right context.
> >
> >agreed.
> >
> >>The default for envir is "parent.frame()", and that can't be right: that 
> >>will see local variables in whatever function called it, so if one of 
> >>them has a variable called "model", you'll subscript it.
> >
> >for one, in my actual application, I _do_ specify envir (and
> >remember the above was really  not  the  original  situation
> >where the subsetting of `model' is actually done only in the
> >`deriv' call).   second, if I'm not missing  something,  doing
> >the  evaluation  in  parent.frame()  is  fine  in  the above
> >example and does not explain  the  error,  whose  triggering
> >solely depends on the number of data points used.
> 
> You are missing something.  parent.frame() will be evaluated by eval(), 
> so it refers to whatever function calls eval().  You don't know what 
> function that will be, because it's buried deep within nls().

thanks for taking the time.

I  see. will remember to be even more careful with `eval' in
the future.

> 
> Perhaps you think that the formula
> 
> y ~ eval(model[[3]])
> 
> is the same as the original one?  It's not.  Try printing it:

no, I understand roughly what lazy evaluation means :-)

> 
> > y ~ eval(model[[3]])
> y ~ eval(model[[3]])
> 
> The eval doesn't get called at this point, it gets called for every step 
> of the least squares minimization.  Who knows what parent.frame() means 
> then?
> 
> >>If I were trying to do what you're doing, I would construct the formula 
> >>before the call to nls, and pass that.  I.e. something like the 
> >>following silly code:
> >>
> >>model2 <- model
> >>model2[[3]] <- model[[3]] # The eval() is implicit
> >>res2 <- nls(model2, start = c(a=2, b=1, c=2))
> >
> >
> >I  was  trying that (more or less) in my original example, 
> >I think, but I will reexamine my application and see, whether I can  bypass
> >the problem somehow along these lines.
> >
> >>If you really want to put eval() in a formula, then I can't see how you 
> >>could avoid an explicit specification of the envir parameter.  So I'd 
> >>call this a bug in your code.
> >
> >
> >as I said, this seems right in principle (still, if the call
> >does happen at some well  defined  place  such  as  a  small
> >function local to a user visible one, the eval without envir
> >might be quite OK) but not w.r.t. explaining the nls  crash.
> 
> No, it's not okay.  Your formula is looking in places it shouldn't.  It 
> makes absolutely no sense for your formula to depend on what function 
> evaluates it.  It should only depend on the data that is available in 
> the data argument to nls() or visible in the current environment.

yes, accepted, as above.

> 
> >katherine  mullen  seems  to  have located the exact spot in
> >`nls' where something goes wrong. so, for now I think martin
> >might be right ("bug in my thinking" by assuming I'm allowed
> >to use eval in this place), but otherwise it is  a  property
> >(or bug) of `nls'.
> 
> She may have discovered where the error call was triggered, but things 
> went wrong in this example when you mis-used eval().  If you think your 

I  don't  think  so  (but  may be wrong, of course): I again
stepped through the `nls'  call  with  `debug'  to  see  the
effect of katharine's tentative patch. AFAICS the problem is
not at all related to actually evaluating the  formula,  but
only  to  the  extraction of 'variables' (a.ka. "x" and "y")
from the formula. the result is injected into  `model.frame'
which finally throws an error because the "eval" part of the
formula provided in the offending `nls' call  is  parsed  as
variable  and  then  identified  by  `model.frame'  as  type
"language". I only have access to the  uncommented  dump  of
`nls'  at  the  R prompt and I don't understand the details,
but it _seems_ not really relevant to the actual  fit,  what
ends  ab  in  the  model frame `mf' as long as `model.frame'
does not complain.

I   think   I  now  better  understand  martin's  remark  of
"accidentally" running through: `nls' simply seems to expect
no  `eval'  constructs via it's formula argument. but from a
user perspective I would find it useful, if  this  could  be
altered.  (and  right  know  `nls'  _does_ accept the `eval'
constructs except in the weird exception of number_of_pars+2
==  number_of_data_points) whether the R core group sees fit
to do this, I cannot say. and although I would'nt  like  it,
martin  is  probably  right  that  at  the  very  least some
consistent error handling needs to be added to `nls'.

> original code used it properly, then simplify that down to a minimal 
> example that triggers the "bug".  Don't put things in the example unless 
> they are necessary to trigger the error.  For example:

yes,  examples  should  be  simple.  I  tried  this. I think
further "simplification" is mostly a matter of taste (and  a
question of a few seconds to adjust it to one's liking).

> 
>  - drop the wrapper function f.

this is simply a means of quickly testing different n.

>  - fix n

the problem is not related to a fixed n, but depends on the model chosen.

>  - use fixed data, not random data.

the data are irrelevant for this error (the fit is never started really).

>  - use an extremely simple model, e.g. y ~ a

I chose a simple model, but one with more than one parameter
(which might be speical) and containing `x'.

> Etc.
> 
> Duncan Murdoch
> 

joerg van den hoff

> 
> >
> >joerg van den hoff
> >
> >>Duncan Murdoch
> >>
> >>
> >>>this is without all the overhead of my first example and now
> >>>(since not quite the same) the problem  arises  at  n  ==  3
> >>>where  the  fit  can't  really  procede  (there  are  also 3
> >>>parameters -- the first example was more  relevant  in  this
> >>>respect)  but  anyway  the  second nls-call does not procede
> >>>beyond the parsing phase of `model.frame'.
> >>>
> >>>so,  I  can't  see  room for a real bug in my code, but very
> >>>well a chance that I misuse `nls'  (i.e.  not  understanding
> >>>what really is tolerable for the `model' argument of `nls').
> >>>but  if the latter is not the case, I would think there is a
> >>>bug in `nls'  (or,  actually,  rather  in  `model.frame'  or
> >>>whatever)  when  parsing  the  nls call.
> >>
> >>>
> >>>>it down to isolate the bug.  Thats a lot of work, but you're the one in 
> >>>>the best position to do it.  I'd say there's at least an even chance 
> >>>>that the bug is in your code rather than in nls().
> >>>>
> >>>>And 2.5.0 *is* ancient; please confirm the bug exists in R-patched if 
> >>>>it turns out to be an R bug.
> >>>if  need  be,  I'll  do  that  (if  I  get it compiled under
> >>>macosX). but assuming  that  you  have  R-patched  installed
> >>>anyway, I would appreciate if you would copy the new example
> >>>above or the old one  below  to  your  R-  prompt  and  see,
> >>>whether  it  crashes  with  the same error message if called
> >>>with the special number of data points (3 for the new, 5 for
> >>>the  old  example)?  and  if  so,  maybe  bring  this to the
> >>>attention of d. bates?
> >>>
> >>>
> >>>j. van den hoff
> >>>>Duncan Murdoch
> >>>>
> >>>>
> >>>>
> >>>>so  I  try  it  here.  sory  for  the
> >>>>>lengthy explanation but it seems unavoidable. to quickly see
> >>>>>the problem simply copy the litte example below and execute
> >>>>>
> >>>>>f(n=5)
> >>>>>
> >>>>>which  crashes. called with n !=  5 (and of course n>3 since
> >>>>>there are 3 parameters in the model...) everything is as  it
> >>>>>should be.
> >>>>>
> >>>>>in detail:
> >>>>>I  stumbled over the follwing _very_ strange behaviour/error
> >>>>>when using `nls' which  I'm  tempted  (despite  the  implied
> >>>>>"dangers") to call a bug:
> >>>>>
> >>>>>I've  written a driver for `nls' which allows specifying the
> >>>>>model and the data vectors using arbitrary  symbols.   these
> >>>>>are  internally  mapped  to  consistent names, which poses a
> >>>>>slight complication when using `deriv' to  provide  analytic
> >>>>>derivatives. the following fragment gives the idea:
> >>>>>
> >>>>>#-----------------------------------------
> >>>>>f <- function(n = 4) {
> >>>>>
> >>>>>  x <- seq(0, 5, length  = n)
> >>>>>
> >>>>>  y <- 2 * exp(-1*x) + 2; 
> >>>>>  y <- rnorm(y,y, 0.01*y)
> >>>>>
> >>>>>  model <- y ~ a * exp (-b*x) + c
> >>>>>
> >>>>>  fitfunc <- deriv(model[[3]], c("a", "b", "c"), c("a", "b", "c", "x"))
> >>>>>
> >>>>>  #"standard" call of nls:
> >>>>>  res1 <- nls(y ~ fitfunc(a, b, c, x), start = c(a=1, b=1, c=1))
> >>>>>
> >>>>>  call.fitfunc <- 
> >>>>>  c(list(fitfunc), as.name("a"), as.name("b"), as.name("c"), 
> >>>>as.name("x"))
> >>>>>  call.fitfunc <- as.call(call.fitfunc)
> >>>>>  frml <- as.formula("y ~ eval(call.fitfunc)")
> >>>>>
> >>>>>  #"computed" call of nls:
> >>>>>  res2 <- nls(frml, start = c(a=1, b=1, c=1))
> >>>>>
> >>>>>  list(res1 = res1, res2 = res2)
> >>>>>}
> >>>>>#-----------------------------------------
> >>>>>
> >>>>>the  argument  `n'   defines  the number of (simulated) data
> >>>>>points x/y which are going to be fitted by some model ( here
> >>>>>y ~ a*exp(-b*x)+c )
> >>>>>
> >>>>>the first call to `nls' is the standard way of calling `nls'
> >>>>>when knowing all the variable and parameter names.
> >>>>>
> >>>>>the second call (yielding `res2') uses a constructed formula
> >>>>>in `frml' (which in this example is of course not necessary,
> >>>>>but  in  the general case 'a,b,c,x,y' are not a priori known
> >>>>>names).
> >>>>>
> >>>>>now, here is the problem: the call
> >>>>>
> >>>>>f(4)
> >>>>>
> >>>>>runs fine/consistently, as does every call with n > 5.
> >>>>>
> >>>>>BUT: for n = 5 (i.e. issuing f(5))
> >>>>>
> >>>>>the second fit leads to the error message:
> >>>>>
> >>>>>"Error in model.frame(formula, rownames, variables, varnames, extras, 
> >>>>>extranames,  : invalid type (language) for variable 'call.fitfunc'"
> >>>>>
> >>>>>I  cornered  this  to a spot in `nls' where a model frame is
> >>>>>constructed in variable `mf'.  the parsing/constructing here
> >>>>>seems  simply  to  be messed up for n = 5: `call.fitfunc' is
> >>>>>interpreted as variable.
> >>>>>
> >>>>>I,  moreover, empirically noted that the problem occurs when
> >>>>>the total number of  parameters  plus  dependent/independent
> >>>>>variables  equals  the number of data points (in the present
> >>>>>example a,b,c,x,y).
> >>>>>
> >>>>>so it is not the 'magic' number of 5 but rather the identity
> >>>>>of data vector length and number of parameters+variables  in
> >>>>>the model which leads to the problem.
> >>>>>
> >>>>>this  is  with  2.5.0  (which  hopefully  is  not considered
> >>>>>ancient) and MacOSX 10.4.10.
> >>>>>
> >>>>>any ideas?
> >>>>>
> >>>>>thanks
> >>>>>
> >>>>>joerg
> >>>>>
> >>>>>______________________________________________
> >>>>>R-help at r-project.org mailing list
> >>>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>PLEASE do read the posting guide 
> >>>>>http://www.R-project.org/posting-guide.html
> >>>>>and provide commented, minimal, self-contained, reproducible code.
> >
> >______________________________________________
> >R-help at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide 
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>