[R] regression with restrictions - optimization problem

Fri Sep 9 17:43:32 CEST 2005

Dear WizaRds!

I am sorry to ask for some help, but I have come to a complete stop in 
my efforts. I hope, though, that some of you might find the problem 
quite interesting to look at.

I have been trying to estimate parameters for lotteries, the so called 
utility of chance, i.e. the "felt" probability compared to a rational 
given probability. A real brief example: Given is a lottery payoff 
matrix of the type

x1    x2 ...     x10     median
1000    5000 ... 5000    3750
0    1000 ... 5000    2250 etc.

The actual data frame consists of 11 columns and 28 rows.

Each entry x1 ... x10 gives the amount of money resp. the utility of 
that amount you receive playing the lottery. The probability for each 
column is 10%. The median represents the empirical answers of players 
where the person is indifferent if they prefer to receive the lottery or 
the sum of money as a sure payoff.

I try to determine the probability people feel instead of the known 10% 
probability of each column payoff entry. But here's the catch:

People also give different utilities to each amount of money, which 
basically gives us some sort of function like this:
u(x1...x10) = u(x1)*pi(p1) + u(x2)*pi(p2) +...+u(x10)*pi(p10)=y
u() - unknown utility function
pi() - unknown probability function
y - empirical answer
p1..p10 - probabilities, here always 0.1

To keep it simple, I set u(0)=0 and u(5000)=5000 and vary u(1000) 
between a start and end point. On each cycle R computes the regression 
coefficients that serve as the pi(p) estimators for every 10% step.
Then I minimize the residual sum of squares which should give the best 
estimators for every 10% step.

How can I possibly calculate a "smooth" pi(p) curve, a curve that should 
look like an "S", plotted against the cumulative 10% probabilities? I 
only have my ten estimators. How can I possibly tell R the necessary 
restrictions of nonnegative estimators and their sum to equal one? Here 
is my quite naive approach:

a70 <- matrix(c(1000,5000,5000,5000,2150, 0,1000,5000,5000,1750, 
0,0,1000,5000,1150, 0,0,0,1000,200, 1000,1000,5000,5000,2050, 
0,1000,1000,5000,1972), ncol=5, byrow=T)
colnames(a70)=c(paste("x", 1:4, sep=""), "med")
a70 <- as.data.frame(a70)

start=800; end=2000
step=10; u1000=start-step

u1000 <- u1000+step # varying the 1000 entry
a70[a70==1000] <- u1000
reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70)
res <- sum( (reg70$residuals^2) )

for (i in 1:( (end-start)/step) ){
         a70[a70==u1000]    <- u1000+step
     u1000 <- u1000+step
     reg70 <- lm(a70$med ~ -1+x1+x2+x3+x4, data=a70)
     if (res >= sum( (reg70$residuals^2) )) {
         res <- sum( (reg70$residuals^2) )
         print(paste("cycle", i, "u1000=", u1000, "RSS=", res))
         final70 <- a70
         finalreg <- reg70
         }
}
print(final70)
summary(finalreg)

     Maybe a better approach works with optim(stats) or dfp(Bhat), but I 
have no idea how to correctly approach such a restricted optimization 
problem.
     Thank you su much for your help and support.

Mark Hempelmann