[R] optim() for ordered logit model with parallel regression assumption

Xu Jun junxu.r at gmail.com
Thu Aug 2 00:43:01 CEST 2012


I am not that proficient in R. I found some codes on web using those
indicator variables to sum up log likelihood. I block out bols in the
codes, but I also tried using them as start value for my estimation of
ologit. Didn't work. Thanks for your suggestion.

Jun

On Wed, Aug 1, 2012 at 5:44 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> Disclaimer: I have not followed this thread at all, but only wish to note:
>
> 1) Indicator variables are (almost?) never needed in R -- that you are
> fooling with them suggests that there is probably a better approach.
>
> 2) Your bols is just least regression, no? -- If so, there are far
> better ways to do this in R.
>
> 3) ?model.frame and ?model.matrix may be relevant to what you're trying to do.
>
> While you may get the help you need on this list (though clearly not
> from me!), I suspect that you would benefit from consulting with a
> local statistician/R expert who would help you with a major rewrite of
> your code.
>
> But ... note my disclaimer again at the top.
>
> Cheers,
> Bert
>
> On Wed, Aug 1, 2012 at 2:34 PM, Xu Jun <junxu.r at gmail.com> wrote:
>> Thanks Michael. Now I switched my approach after doing some google.
>> Following are my new codes:
>>
>> ###################################################
>>  library(foreign)
>>  readin <- read.dta("ordfile.dta", convert.factors=FALSE)
>>  myvars <- c("depvar", "x1", "x2", "x3")
>>  mydta <- readin[myvars]
>>  # remove all missings
>> mydta <- na.omit(mydta)
>>
>> ologit.lf <- function(theta, y, X) {
>>   X <- as.matrix(X)
>>   y <- as.matrix(y)
>>   n <- nrow(X)
>>   k <- ncol(X)
>>   b <- theta[1:k]
>>   tau1 <- as.numeric(theta [k+1])
>>   tau2 <- as.numeric(theta [k+2])
>>   tau3 <- as.numeric(theta [k+3])
>>
>>   p1 <- (1/(1+exp( - tau1 + X %*% b)))
>>   p2 <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
>>   p3 <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
>>   p4 <- 1 - (1/(1+exp( - tau3 + X %*% b)))
>>
>>   # create indicator variables
>>   i  <- rep(1, nrow(X))
>>   i1 <- as.numeric(  i == y)
>>   i2 <- as.numeric(2*i == y)
>>   i3 <- as.numeric(3*i == y)
>>   i4 <- as.numeric(4*i == y)
>>
>>   llik <- sum(i1*log(p1) + i2*log(p2) + i3*log(p3) + i4*log(p4))
>>   return(-llik)
>> }
>>
>> y <- (mydta$depvar)
>> X <- cbind(mydta$x1, mydta$x2, mydta$x3)
>>
>> if (FALSE) { # providing start values in optim
>>   xint <-  cbind(rep(1,nrow(X)),X)
>>   bols <- (solve(t(xint) %*% xint)) %*% (t(xint) %*% y)
>>   startval <- rbind(as.matrix(bols[2:nrow(bols)]),0,0,0)
>> }
>>
>> runopt <- optim(rep(0, ncol(X)+3), ologit.lf, method="BFGS",
>> hessian=T, y=y, X=X)
>> #########################################################################
>>
>> But then I got the following error message;
>> Error in optim(rep(0, ncol(X) + 3), ologit.lf, method = "BFGS", hessian = T,  :
>>   initial value in 'vmmin' is not finite
>>
>> I know there is something wrong with the way that I set up my initial
>> values, but just couldn't figure out how. Any help would be greatly
>> appreciated!
>>
>> Jun
>>
>> On Tue, Jul 31, 2012 at 10:07 PM, R. Michael Weylandt
>> <michael.weylandt at gmail.com> wrote:
>>> On Tue, Jul 31, 2012 at 7:57 PM, Xu Jun <junxu.r at gmail.com> wrote:
>>>> Dear R listers,
>>>>
>>>> I am learning the MLE utility optim() in R to program ordered logit
>>>> models just as an exercise. See below I have three independent
>>>> variables, x1, x2, and x3. Y is coded as ordinal from 1 to 4. Y is not
>>>> yet a factor variable here. The ordered logit model satisfies the
>>>> parallel regression assumption. The following codes can run through,
>>>> but results were totally different from what I got using the polr
>>>> function from the MASS package. I think it might be due to the way how
>>>> the p is constructed in the ologit.lf function. I am relatively new to
>>>> R, and here I would guess probably something related to the class type
>>>> (numeric vs. matrix) or something along that line among those if
>>>> conditions. Thanks in advance for any suggestion.
>>>>
>>>> Jun Xu, PhD
>>>> Assistant Professor
>>>> Department of Sociology
>>>> Ball State University
>>>> Muncie, IN 47306
>>>>
>>>>
>>>> ####################################################################
>>>>
>>>> library(foreign)
>>>> readin <- read.dta("ordfile.dta", convert.factors=FALSE)
>>>> myvars <- c("depvar", "x1", "x2", "x3")
>>>> mydta <- readin[myvars]
>>>> # remove all missings
>>>> mydta <- na.omit(mydta)
>>>>
>>>> # theta is the parameter vector
>>>> ologit.lf <- function(theta, y, X) {
>>>>   n <- nrow(X)
>>>>   k <- ncol(X)
>>>> # b is the coefficient vector for independent variables
>>>>   b <- theta[1:k]
>>>> # tau1 is cut-point 1
>>>>   tau1 <- theta [k+1]
>>>> # tau2 is cut-point 2
>>>>   tau2 <- theta [k+2]
>>>> # tau3 is cut-point 1
>>>>   tau3 <- theta [k+3]
>>>>
>>>>   if (y == 1){
>>>>     p <- (1/(1+exp( - tau1 + X %*% b)))
>>>>   }
>>>>   if (y == 2) {
>>>>     p <- (1/(1+exp( - tau2 + X %*% b))) - (1/(1+exp( - tau1 + X %*% b)))
>>>>   }
>>>>   if (y == 3) {
>>>>     p <- (1/(1+exp( - tau3 + X %*% b))) - (1/(1+exp( - tau2 + X %*% b)))
>>>>   }
>>>>   if (y == 4) {
>>>>     p <- 1 - (1/(1+exp( - tau3 + X %*% b)))
>>>>   }
>>>>   - sum(p)
>>>> }
>>>>
>>>> y <- as.numeric(mydta$depvar)
>>>> X <- cbind (mydta$x1, mydta$x2, mydta$x3)
>>>> runopt <- optim(rep(0, ncol(X)+4), ologit.lf, method="BFGS",
>>>> hessian=T, y=y, X=X)
>>>>
>>>>
>>>> There were 50 or more warnings (use warnings() to see the first 50)
>>>>> warnings()
>>>>
>>>> 1: In if (y == 1) { ... :
>>>>   the condition has length > 1 and only the first element will be used
>>>> 2: In if (y == 2) { ... :
>>>>   the condition has length > 1 and only the first element will be used
>>>
>>> It looks like you've got a fundamental problem in your if/else
>>> statements. if and else are control structures and so they operate on
>>> the whole program flow -- I think you want the ifelse() function here
>>> instead.
>>>
>>> Take a look at this example:
>>>
>>> x <- c(1, 5, 9)
>>>
>>> if(x < 3) {y <- x^2} else {y <- 2}
>>>
>>> z <- ifelse(x < 3, x^2, 2)
>>>
>>> print(x)
>>> print(y)
>>> print(z)
>>>
>>> See also ?ifelse
>>>
>>> Best,
>>> Michael
>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list