[R] Uniroot and Newton-Raphson Anomaly

Doran, Harold HDoran at air.org
Mon Mar 16 15:34:27 CET 2009


I have the following function for which I need to find the root of a:

f <- function(R,a,c,q) sum((1 - (1-R)^a)^(1/a)) - c * q

To give context for the problem, this is a psychometric issue where R is
a vector denoting the percentage of students scoring correct on test
item i in class j, c is the proportion correct on the test by student k,
and q is the number of items on the test in total.

I have programmed this using Newton-Raphson as follows:

numer <- function(R,a,c,q){
         result <- sum((1-(1-R)^a)^(1/a))-c*q
         result 
      }

      denom <- function(R,a,c,q){
         result <- sum(-((1 - (1 - R)^a)^(1/a) * (log((1 - (1 - R)^a)) *
(1/a^2)) + 
            (1 - (1 - R)^a)^((1/a) - 1) * ((1/a) * ((1 - R)^a * log((1 -
R))))))
         result
      }

   aConst <- function(R, c, q, con){
      a <- .5 # starting value for a
      change <- 1
      while(abs(change) > con) {
	   r1 <- numer(R,a,c,q)
	   r2 <- denom(R,a,c,q)
         change <- r1/r2
         a      <- a - change
      }
      a
   }

The function now works as follows. Assume there are two test items on
the test. Assume the proportion correct on the items in class j is .2
and .4, and assume student k scored correctly on one item only.

aConst(R = c(.2,.4), c=.5, q=2, con=.001)

Now, one might notice that the first derivative of the function (in the
function denom) has in it log(1-R). In cases where all students in a
class answer the item correct, R = 1, and this creates an anomaly in
that log(0) is undefined. One cheap trick, I think, is to correct the
vector R such that any values of 1 become .999 and any values of 0
become .001 (0 is also an anomaly).

I am accustomed to Newton-Raphson and don't use bisection or the uniroot
function. So, given the anomaly above, I am thinking a change of mindset
might be necessary, although I am not certain if the same issue that
affects NR will propagate to other root finding algorithms. 

Now, to use uniroot, my understanding is that I need to start with two
guesses for a lower and upper limit of a such that:

f(x_l)*f(x_u) < 0

Where f(x_l) and f(x_u) are the lower and upper limits, respectively.
With this, I can try:

f <- function(R,a,c,q) sum((1 - (1-R)^a)^(1/a)) - c * q
uniroot(f, c(.5,2), R = c(.2,.4), c = .5, q=2)

Which gives the same root as my NR function. Now, the issue I am running
into is that this function is applied to each student in the data, which
can be in the hundreds of thousands, and the only value that is fixed is
q. The values of c vary by student and the values in the vector R vary
by class. So, as I have applied this to a larger dataset, I often can't
find the root because the values I use for the search interval are not
universal to maintain the necessary condition that f(x_l)*f(x_u) < 0 for
student k' in class j'. As a result, the error that the endpoints are
not of opposite sign appears. 

Now, it is not the error that confuses me, that I believe is rather
transparent in meaning. It is how to apply a search interval that can be
universally applied when implementing this function to many students
when the values of R and c vary.  Obviously, with hundreds of thousands
of students I cannot toy around with the function for each student to
find a search interval to maintain f(x_l)*f(x_u) < 0. So, after
pondering this over the weekend, I wonder if the list might have
reactions on the following two questions;

1) First, would the issue I note about NR having log(0) propagate into
other root finding algorithms? I realize bisection does not make use of
derivatives, and this occurs in the first derivative, but I'm not savvy
enough to see if this is an artifact of the function.

2) Second, is there a more thoughtful way to identify a search interval
that can be automatically programmed when iterating over hundreds of
thousands of cases such that "universal" values for the search interval
can be used?

Many thanks,

Harold


> sessionInfo()
R version 2.8.1 (2008-12-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


other attached packages:
[1] MiscPsycho_1.4 statmod_1.3.8 
> 




More information about the R-help mailing list