[R] NLS sensitivity to start= values or poles in data range

Fri May 11 22:33:12 CEST 2012

Greetings R-help!  I'm fairly new to R and am trying to expand my knowledge beyond using R for simple summary statistics and basic tests.  To that end I am attempting to write an interactive R-script that will perform a general rational function fit to a given dataset based on the example given at http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd64.htm.

The problem that I seem to have is that the fitted function is nearly identical to the NIST results when I use their initial values (these are representative values that are close to actual data, but not taken from the dataset) for generating the starting point for the nls() routine.  However if I try to use actual data points extracted from the dataset for the preliminary fit, the fitted function can often fail to approximate the data.

One culprit that I suspect may be that a pole (real root of the denominator) of the initial fit is laying in the range of the data, since the NIST starting values produce a preliminary fit that does not have any poles over the data range.  Unfortunately, I'm not familiar enough with the nls() routines to pass judgement.

The attached code reads in the data from the web, performs the preliminary fit using lm(), passes that to nls() and plots the data, starting values, preliminary fit and final fit. The code block in the middle defining indsubset and depsubset contain three different options for the starting values. I thank you all in advance for any help or insights.

Simplified code follows...  Actual code for the interactive version is available at http://www.schrodingersghost.com/RationalFit-0.2.R

  # Start of Code	
  # Read and plot the data

  InputData <- read.table("http://itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Hahn1.dat",
				  header=0,
				  row.names=NULL, 
				  col.names=c("CTE","K"),
				  skip=60)
  indvar <- 2
  depvar <- 1

  indvarname <- names(InputData)[indvar]
  depvarname <- names(InputData)[depvar]

  attach(InputData)
  matplot(InputData[indvar],
          InputData[depvar], 
          type='p', pch=4, 
          xlab = indvarname, 
          ylab = depvarname, 
          main = bquote(paste(.(depvarname), " vs ", .(indvarname)))
          )

  # Specify values for determining nls start

  ########################################
  # Actual Data equally spaced in data set

  #indsubset <- c(14.13, 72.77, 163.48, 273.13, 419.51, 549.53, 850.98)
  #depsubset <- c(0.08, 7.267, 13.974, 16.337, 17.767, 18.61, 21.085)

  # Actual Data closest values to NIST starting values

  #indsubset <- c(14.13,31.30,40.03,50.24,120.25,202.14,845.97)
  #depsubset <- c(0.08,1.089,2.241,3.697,12.005,15.190,20.830)

  # NIST Starting Values

  indsubset <- c(10,30,40,50,120,200,800)
  depsubset <- c(0,2,3,5,12,15,20)
  #
  #########################################

  points(indsubset,depsubset, type="p", pch=19, col="red")

  # Generate starting values for nls per NIST process

  initmodel <- (depsubset ~   I(indsubset^1) 
					+ I(indsubset^2) 
					+ I(indsubset^3) 
					+ I(-1 * indsubset^1 * depsubset) 
					+ I(-1 * indsubset^2 * depsubset) 
					+ I(-1 * indsubset^3 * depsubset))

  InputDatalm <- lm(initmodel)
  summary(InputDatalm)
  fitted.values(InputDatalm)
  nlsstart <- coef(InputDatalm)  
  names(nlsstart) <- c("A0", "A1", "A2", "A3", "B1", "B2", "B3")
  nlsstart <- as.list(nlsstart)
  attach(nlsstart)
  curve((A0*x^0+A1*x^1+A2*x^2+A3*x^3)/(1+B1*x^1+B2*x^2+B3*x^3), 
		from = min(K), 
		to = max(K), 
		add=TRUE, 
		col="blue", 
		lty=5, lwd=2 )

  # Check the poles of the starting function

  poles <- polyroot(c(1,B1,B2,B3))
  cat((poles), '\n')

  # Run the nls using above values as starting points

  fullmodel <- {CTE ~ (A0 * K^0 + A1 * K^1 + A2 * K^2 + A3 * K^3)/
			    (1 + B1 * K^1 + B2 * K^2 + B3 * K^3)}
  rationalfit <- nls(fullmodel,start=nlsstart)
  summary(rationalfit)
  rationalcoef <- as.list(coef(rationalfit))
  attach(rationalcoef)
  curve((A0*x^0+A1*x^1+A2*x^2+A3*x^3)/(1+B1*x^1+B2*x^2+B3*x^3), 
		from = min(K), 
		to = max(K), 
		add=TRUE, lwd=2)
  legend(600,5,c("Starting Values","Initial Value Fit","Full Model Fit","Data"),
		lty=c(0,5,1,0),
		pch=c(19,NA,NA,4),
		col=c("red","blue","black","black"),
		plot=TRUE)

  # Check the poles of the final function

  poles <- polyroot(c(1,B1,B2,B3))
  cat((poles), '\n')

  # Cleanup
  detach(InputData)
  detach(rationalcoef)
  # End of Code

Thank you,

Christopher Battles