[R] Recursion in R ...

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Fri Jul 6 18:54:30 CEST 2007


Hi Folks,

R has known speed issues for recursive definitions.
There was a thread "Extremely slow recursion in R?" in
August 2006 (24 Aug, from Jason Liao), with some
interesting comparisons between programming languages.

I'm re-opening the topic, with an ulterior motive
(stated at the end).

The starting point was the question "How many distinct
increasing sequences of length n can be made from k
integers (1:k)?", i.e. what is the size of the sample
space of

  sort(sample((1:k),n,replace=TRUE))

Let this number be Nnk(n,k). I aptly observed that

  Nnk(1,k) = k
  Nnk(n,k) = Nnk(n-1,k) + sum[r= 1 to k](Nnk(n-1,r))

So I slickly wrote a recursive definition:

Nnk<-function(n,k){
  if(n==1) {return(k)} else {
    R<-0;
    for(r in (1:k)) R<-(R+Nnk(n-1,k-r+1,depth))
  }
  return(R)
}

I then ran a few test cases, to check timing etc.:

Nnk(21, 7)   34 seconds
Nnk(21, 8)  120 seconds
Nnk(21, 9)  384 seconds
Nnk(21,10) 1141 seconds

I then complacently set it to work out the number I happened
to really want:

Nnk(21,20)

24 hours later, no result; and an extra "test line" (not
shown above) at the start of the function had prodced no
output, so the function had not even returned to the top
level for the first time.

Then, with heavy thunderstorms rolling in, and my mains
power beginning to flicker, I closed down the computer.

A quick extrapolation based on the above results, and a
few others, suggested that the time for completion of
Nnk(21,20) would be about 2 years.

Then I thought about it.

The recursion relation in fact shows that Nnk(n,k) is
the maximum value in cumsum(Nnk(n,r)) over r = 1:k.

Hence (for n>=1 and r>=1):

Nnk<-function(n,k){
  K<-rep(1,k)
  for(i in (1:n)){
    K<-cumsum(K)
  }
  return(max(K))
}

This definition then gave me, in so brief a blink of the
eye that I could make no estimate of it, the result:

Nnk(21,20) = 131282408400

In fact, I had to go up to things like

Nnk(1000,200) = 3.335367e+232

before I could sensibly perceive the delay (about 0.5 seconds
in this case).

On the old recursive definition, the computation might have
outlasted the Universe.



ON THAT BASIS: I hereby claim the all-time record for inefficient
programming in R.

Challengers are invited to strut their stuff ...

Best wishes to all, and happy end of week,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jul-07                                       Time: 17:53:55
------------------------------ XFMail ------------------------------



More information about the R-help mailing list