[R] nls() fails on a simple exponential fit, when lm() gets it right?

Fri Aug 29 14:24:54 CEST 2008

Toby Marthews wrote:
> Dear R-help,
>
> Here's a simple example of nonlinear curve fitting where nls seems to get
> the answer wrong on a very simple exponential fit (my R version 2.7.2).
>
> Look at this code below for a very basic curve fit using nls to fit to (a)
> a logarithmic and (b) an exponential curve. I did the fits using
> self-start functions and I compared the results with a more simple fit
> using a straight lm() command.
>
> The logarithmic fit works 100% correctly below. The problem is with the
> exponential fit: the nls command gives the wrong values and I have
> completely failed to see why this should happen. I can't see any misake in
> my self-start function, so why the mismatch?? Even worse, if I give nls a
> trivial fit by removing the "#&#&" on line 6, it fails to converge at all
> when the 'simpler' method with lm() easily gives the right answer (r=0.03
> and A=5).
>
> I did the same exp and ln fits using MS Excel and the numbers returned are
> exactly as for the 'simpler ways', so nls is definitely in the wrong, but
> the self-start is almost identical to the one for the logarithmic fit,
> which works perfectly ....
>
> I can't see my mistake at all. Can anyone help??
>   
It's in the theory. A least squares fit to log(Y) is not equivalent to a
least squares fit to Y. The latter assigns more weight to larger values.
Fitting on a log scale is warranted if the error variance after
transformation is independent of the magnitude of the response. This is
pretty patently wrong for your data: look at plot(log(height)~dbh),
which by the way also shows that the exponential model is wrong for
these data.

So EXCEL is definitely in the wrong!!!!!!
> Thanks,
> Toby Marthews
>
>
> traceson=FALSE;maxint=10000;minstepfactor=2^(-30);tolerance=1e-7
> xtxt="DBH in mm";ytxt="Height in m"
> dbh=c(0.9,1.0,1.1,1.2,4.8,4.9,5.0,5.1,8.9,9.0,9.1,9.2,11.8,11.9,12.0,12.1,14.9,15.1,15.2,15.5)
> height=c(5.770089,5.154794,4.888847,6.356770,14.849109,14.973146,15.423816,14.865038,21.335985,20.477624,20.915823,21.886004,23.714724,24.182210,23.954929,23.784659,25.334360,25.411320,26.218614,25.612478)
> #&#&#height=5*exp(0.03*dbh)
>
> logarithmicInit=function(mCall,LHS,data) {
>  xy=sortedXyData(mCall[["x"]],LHS,data)
>  aaa=0.01			#Just guess aaa=0.01 to start the fit
>  bbb=min(xy$y)		#Use the minimum y value as an initial guess for bbb
>  value=c(aaa,bbb)
>  names(value)=mCall[c("aaa","bbb")]
>  return(value)
> }
> fmla=as.formula("~(aaa*log(x)+bbb)")
> SSlogarithmic=selfStart(fmla,initial=logarithmicInit,parameters=c("aaa","bbb"))
>
> exponenInit=function(mCall,LHS,data) {
>  xy=sortedXyData(mCall[["x"]],LHS,data)
>  r=0.01				#Just guess r=0.01 to start the fit
>  A=min(xy$y)		#Use the minimum y value as an initial guess for A
>  value=c(r,A)
>  names(value)=mCall[c("r","A")]
>  return(value)
> }
> fmla=as.formula("~(A*exp(r*x))")
> SSexponen=selfStart(fmla,initial=exponenInit,parameters=c("r","A"))
>
> cat("Logarithmic fit (here, the self-start and the 'simpler' way match):\n")
> tmp=getInitial(height~SSlogarithmic(dbh,aaa,bbb),data=data.frame(dbh,height))
> cat("(Starting values for the logarithmic fit were: aaa=",tmp[1],"and
> bbb=",tmp[2],")\n")
> modelfit=nls(height~SSlogarithmic(x=dbh,aaa,bbb),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor))
> bfaaa=summary(modelfit)$coefficients[1];bfbbb=summary(modelfit)$coefficients[2]
> cat("Best logarithmic fit is (",ytxt,")=(aaa*log(",xtxt,")+bbb) with
> parameters aaa=",bfaaa,"and bbb=",bfbbb,"\n")
> #Produces: Best logarithmic fit is ( Height in m )=(aaa*log( DBH in mm
> )+bbb) with parameters aaa= 7.551666 and bbb= 4.594367
>
> lnfit=lm(height~log(dbh))	#This is doing the ln fit without a self-start
> function
> cat("Done another, simpler, way, the best logarithmic fit has parameters:
> aaa=",lnfit$coefficients[2],"and bbb=",lnfit$coefficients[1],"\n")
> #Produces: Done another, simpler, way, the best logarithmic fit has
> parameters: aaa= 7.551666 and bbb= 4.594367
>
> cat("Exponential fit (here, the self-start and the 'simpler' way DON'T
> match):\n")
> tmp=getInitial(height~SSexponen(dbh,r,A),data=data.frame(dbh,height))
> cat("(Starting values for the exponential fit: r=",tmp[1],"and
> A=",tmp[2],")\n")
> modelfit=nls(height~SSexponen(x=dbh,r,A),trace=traceson,control=list(maxiter=maxint,tol=tolerance,minFactor=minstepfactor))
> bfr=summary(modelfit)$coefficients[1];bfA=summary(modelfit)$coefficients[2]
> cat("Best exponential fit is (",ytxt,")=(A*exp(r*(",xtxt,"))) with
> parameters r=",bfr,"and A=",bfA,"\n")
> #Produces: Best exponential fit is ( Height in m )=(A*exp(r*( DBH in mm
> ))) with parameters r= 0.07134055 and A= 9.47907
>
> expfit=lm(log(height)~dbh)	#This is doing the exp fit without a self-start
> function
> cat("Done another, simpler, way, the best exponential fit has parameters:
> r=",expfit$coefficients[2],"and A=",exp(expfit$coefficients[1]),"\n")
> #Produces: Done another, simpler, way, the best exponential fit has
> parameters: r= 0.1028805 and A= 6.75045
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907