[R] optimization challenge

Thu Jan 28 22:47:29 CET 2010

Well, Albyn Jones gave a great solution to my challenge that found the best reading schedule.

My original thought was that doing an exhaustive search would take too much time, but Albyn showed that there are ways to do it efficiently.

My approach (as mentioned before) was to use optim with method SANN.  I treated it like a balls and urns problem.  Since I wanted to read 239 chapters in 128 days it would be like putting 239 balls in 128 urns.  I started with 1 ball in each urn, then put the remain balls into urns to get a starting state (tried a couple different starting situations, 1 additional ball in each of the 1st urns, all the remaining balls in the first or last urn).  Then my update step was just to take a single ball from one of the urns with 2 or more balls and move it to another urn at random.

On tricky thing with this method is that moving one ball could change things quite a bit because one setup could have the longest chapters being read by themselves, but moving one ball would result in a long chapter now being grouped with others.  My first update function just moved the ball to a random urn, then I tried moving the ball only one urn forward or backwards (this seemed to work better, but probably needed a longer run time).  Finally the best method that I found chose the ball to move proportional to the lengths of the days reading and chose the urn to put it in with highest probability for days with the shortest readings.

I thought my answers were pretty good (they looked reasonable), but Albyn's solution had half the variance as my best result.  Below is the code that I used for my best results in case anyone is interested.  I would also be interested if anyone could find a way to improve on what I did to get better results (help me learn SANN better, the arguments I used came mostly from trial and error).

days <- seq( as.Date('1/24/10',"%m/%d/%y"), as.Date('5/31/10','%m/%d/%y'), by=1)

sq2.2 <- rep(1, length(days))
sq2.2[ length(days) ] <- nrow(bom3) -sum(sq2.2) + 1

genseq4 <- function(sq) {
	w <- rep(1:length(days), sq)
	tmp <- tapply( bom3$Verses, w, sum )

	ww <- which(sq>1)

	dwn <- if (length(ww) > 1) {
		sample( ww, 1, prob= tmp[ww] )
	} else {
		ww
	}
	up  <- sample( seq_along(sq)[-dwn], 1, prob=max(tmp) - tmp[-dwn] )

	sq[dwn] <- sq[dwn] - 1
	sq[up]  <- sq[up]  + 1

	sq
}

distance <- function(sq) {
	w <- rep(1:length(days), sq)
	tmp <- tapply( bom3$Verses, w, sum )
	var(tmp)
}

res <- optim(sq2.2, distance, genseq4, method="SANN",
	control=list(maxit=30000, temp=50, trace=TRUE ))

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111