[R] Non-linear maximization function in R

Tue Oct 18 20:12:09 CEST 2011

Hello,

# Full disclosure. I am not sure if my problem is a bug(s) in the code, or a
fundamental misunderstanding on my part about what I am trying to do with
these statistics. I am not familiar with maximum likelihood tests. 

# I currently have two vectors 

Aequipecten<-c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

Samples<-c(0.03333333, 0.06666667, 0.25000000, 0.27222222, 0.27777778,
0.28888889, 0.29444444, 0.30000000, 0.31666667, 0.32777778, 0.33333333,
0.33888889, 0.35000000, 0.35555556, 0.36111111, 0.37777778, 0.38333333,
0.38888889, 0.40000000, 0.40555556, 0.41111111, 0.42222222, 0.43333333,
0.44444444, 0.45000000, 0.46111111, 0.46666667, 0.47222222, 0.47777778,
0.49444444, 0.52222222, 0.53888889, 0.55000000, 0.55555556, 0.56111111,
0.56666667, 0.57222222, 0.57777778, 0.58333333, 0.58888889, 0.60000000,
0.60555556, 0.61111111, 0.61666667, 0.62222222, 0.62777778, 0.63333333,
0.63888889, 0.64444444, 0.65000000, 0.65555556, 0.66111111, 0.66666667,
0.67222222, 0.67777778, 0.68333333, 0.68888889, 0.69444444, 0.70000000,
0.70555556, 0.71111111, 0.71666667, 0.72222222, 0.72777778, 0.73333333,
0.73888889, 0.74444444, 0.75000000, 0.75555556, 0.76111111, 0.76666667,
0.77222222, 0.77777778, 0.78333333, 0.78888889, 0.79444444, 0.80000000,
0.81111111, 0.81666667, 0.83888889, 0.85555556, 0.86666667, 0.88333333,
0.88888889, 0.89444444, 0.94444444, 0.95555556)

# What I want to do is a log-likelihood model selection process where i
check if a Gaussian fit or a skewed unimodal fit (among others) is better.

# The first thing I do is use this function written by Oksanen
http://cc.oulu.fi/~jarioksa/softhelp/hof3.pdf

HOF.start<-function(y,x,model=5) {
	if (model >=4) {
		k<-glm(y ~ x + I(x^2),family=binomial)
		k1<-k$coefficients[1]
		k2<-k$coefficients[2]
		k3<-k$coefficients[3]
		u<-(-k2/2/k3)
		h<-plogis(k1-k2^2/4/k3)
		r<-log(1/h*(-2*h+2*sqrt(h))/2)
		b<-5.07-0.227*k3
		a<-(-b*u+r)
		c<-b*u+4
		}
	else {
		k<-glm(y ~ x,family=binomial)
		k1<-k$coefficients[1]
		k2<-k$coefficients[2]
		a<-(-k1)
		b<-(-k2)
		c<-0
		}
	switch(model,
		p<-c(a=a),
		p<-c(a=a,b=b),
		p<-c(a=a,b=b,c=c),
		p<-c(a=a,b=b,c=c),
		p<-c(a=a,b=b,c=c,d=b),
		)
	p
	}

# What I am concerned about here is option (model=4). 

>p<-HOF.start(Aequipecten,Samples,model=4)
>p
 a.I(x^2)  b.I(x^2)  c.I(x^2) 
-6.520338 10.330863 10.547157 

# I am not sure if this function is working correctly (but see below for my
alternative way of deriving equivalent p's). The goal is to use these values
"p" as a starting point for the non-linear maximization.

# Next, I plug those values (p) into the non-linear maximization which uses
the following functions:

HOF.fun<-function(p,model,x,M=1) {
	p0<-min(x)
	ra<-max(x)-p0
	x<-(x-p0)/ra
	switch(model,
		fv<-rep(M/(1+exp(p[1])),length(x)),
		fv<-M/(1+exp(p[1]+p[2]*x)),
		fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3])),
		fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3]-p[2]*x)),
		fv<-M/(1+exp(p[1]+p[2]*x))/(1+exp(p[3]-p[4]*x))
		)
	fv
	}

HOF<-function(x,y,p,model,M=1) {
	fv<-HOF.fun(p,model,x,M=M)
	sum(-dbinom(y,M,fv,log=TRUE))
	}

# My understanding of how this is supposed to work is that the final line of
the HOF function:
"sum(-dbinom(y,M,fv,log=TRUE))" is supposed to give the negative log-odds of
that function being a good fit.

> HOF(p,x=Sample,y=Aequipecten,model=4)
[1] 63.38763

# Notice, however, that it returns a positive value, when it is supposed to
be negative. What does this mean? What am I doing wrong?

#Furthermore, if I attempt a non-linear maximization on this function, I
continue to get a positive negative log-odds and what I think are
unreasonable looking parameter estimates. What does this mean?

>sol<-nlm(HOF,p,x=Sample,y=Aequipecten,model=4)
> sol
$minimum
[1] 32.72701

$estimate
[1] -8.751336 12.687632  8.281556

$gradient
[1] -4.640475e-07 -5.290583e-07  5.618925e-08

$code
[1] 1

$iterations
[1] 16

# If I compare these results to simply doing a straight gaussian logistic
regression on the vectors I get quite a different result. I'm very confident
that this equation is working properly. I would expect the outputs of this
function to be roughly equivalent to the results from object "p" as seen
above, but they don't look similar at all. I'm not sure what this means.

tBLparamsForOneSpecies <- function(theData) {
	x <- theData[ ,1]
	y <- theData[ ,2]
	logitReg <- glm(y ~ x + I(x^2), family=binomial)
	b0 <- logitReg$coefficients[1]
	b1 <- logitReg$coefficients[2]
	b2 <- logitReg$coefficients[3]
	opt <- (-b1)/(2*b2)
	tol <- 1 / sqrt(-2*b2)
	pmax<-1/(1+exp(b1^2/(4*b2)-b0))
	theParams <- cbind(opt, tol, pmax)
	theParams
	}

>theData<-cbind(Sample,Aequipecten)
>tBLparamsForOneSpecies(theData)
> tBLparamsForOneSpecies(theData)
        opt       tol      pmax
x 0.6337473 0.1468823 0.2433407

#Nor do to the coefficients look similar to p either.

> glm(Aequipecten~Sample+I(Sample^2),family=binomial)

Call:  glm(formula = Aequipecten ~ Sample + I(Sample^2), family = binomial) 

Coefficients:
  (Intercept)       Sample  I(Sample^2)  
       -10.44          29.37         -23.18  

Degrees of Freedom: 86 Total (i.e. Null);  84 Residual
Null Deviance:	    73.38 
Residual Deviance: 68.07 	AIC: 74.07 

# So anyway, to sum up my problem, I'm concerned that the results of my
function HOF and of my non-linear maximization of HOF keep giving me a
positive log-odds. I'm not sure if this is a result of a bug in the code
somewhere, or if I actually just don't understand what I'm doing in terms of
the statistics. I've tried to provide as much information as possible, but I
really don't know where or what the problem is.

Thank you for any help,

Andrew

--
View this message in context: http://r.789695.n4.nabble.com/Non-linear-maximization-function-in-R-tp3916240p3916240.html
Sent from the R help mailing list archive at Nabble.com.