[R] Error when running Conditional Logit Model

Hien Nguyen hien.nmsu at gmail.com
Sat Dec 19 01:39:37 CET 2009


Thanks a lot for answering my questions.

I have tried to run the clogit for only 64 observations and 4 
independent variables and the results are solved instantly. However, 
when I run the same command (with only 4 dependent variables) for the 
full data, it keeps running for 50 minutes now. :(

Thomas, what do you mean by "maximizing the unconditional likelihood is 
fine when the stratum sizes are large"? What I put in "strata (__)" is 
actually the possible choices (1-64). Each choices will be recored more 
than 4000 times (which means I have more than 4000 values of 1, 4000 
values of 2 and so on).
Does it sound right?

Thanks a lot

Hien

tlumley at u.washington.edu wrote:
> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>
>> Dear Drs Winsemius and Berry,
>>
>> Thanks a lot for your comment and suggestions on running my model. I 
>> am not just new to R but new to CLM as well. :( With your 
>> suggestions, I figure out that I have huge misunderstandings on the 
>> model and data arrangement.
>>
>> After my finals, I have read again related materials on CLM and 
>> rearranged in an appropriate way before running the model in R. This 
>> time, I have a data of more than 250,000 observations (created from 
>> more than 4000 response) and a model of 15 predictors.
>>
>> My question is that how long should it takes for the clogit command 
>> to run because it has been running for more 10 hours on a quad-core 
>> computer and still doesn't show any sign of done or almost done. Is 
>> it OK or my command just does not work.
>
> If you have a lot of records with case=1 in a stratum, conditional 
> logistic regression will be extremely slow.   And unnecessary: 
> maximizing the unconditional likelihood is fine when the stratum sizes 
> are large.
>
> Note that a quad-core computer won't help. Only one core will be used 
> in the computations.
>
>      -thomas
>
>
>
>
>> Thanks a lot for your response
>>
>> Hien
>>
>>
>> Charles C. Berry wrote:
>>> On Fri, 4 Dec 2009, David Winsemius wrote:
>>>
>>>>
>>>> On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>>>
>>>>> Dear Dr. Winsemius,
>>>>>
>>>>> Thank you very much for your reply.
>>>>>
>>>>> I have tried many possible combinations (even with the model of 
>>>>> only 2 predictors) but it produces the same message. With more 
>>>>> than 4000 observations, I think 14 predictors might not be too many.
>>>>
>>>> It is what happens in the factor combinations that concern me. I am 
>>>> guessing that some of those predictors are factors. You really 
>>>> should not ask r-help questions without providing better 
>>>> descriptions of both the outcomes and the predictor variables.
>>>>
>>>>>
>>>>> Although my dependent variable (Pin) is not discrete  (it ranges 
>>>>> from 0 to 1), I do not think it will create problems to the 
>>>>> estimation but I'm not sure
>>>>
>>>> I would think it _would_ cause problems. As I understand it, 
>>>> conditional methods create contingency tables. Why are you using an 
>>>> outcome type that is not consistent with the fundamental regression 
>>>> assumptions of the clogit function?
>>>>
>>>> I do not get that particular error when I munge the infert dataset 
>>>> to have case be a random uniform value, but I do get an error.
>>>>>  infert$case <- runif(nrow(infert))
>>>>>  clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>>> Error in Surv(rep(1, 248L), case) : Invalid status value
>>>>
>>>
>>> David, I think you were on the right track. I get this:
>>>
>>> -----------
>>>> clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert) 
>>>
>>> Error in fitter(X, Y, strats, offset, init, control, weights = 
>>> weights,  :
>>>   NA/NaN/Inf in foreign function call (arg 6)
>>> In addition: Warning messages:
>>> 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>>   Invalid status value, converted to NA
>>> 2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>>>   Ran out of iterations and did not converge
>>>>
>>> ------------
>>>
>>> which looks pretty much the same as Hien's error msg
>>>
>>> So Hien needs to create a logical status value.
>>>
>>> Chuck
>>>
>>> p.s.
>>>
>>>> sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] survival_2.35-7
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.10.0
>>>>
>>>
>>>
>>>> So I certainly would not have proceeded to submit a full analysis 
>>>> to clogit if I could not get a test case to run under the situation 
>>>> you propose.
>>>>
>>>> -- 
>>>> David
>>>>
>>>>>
>>>>> I have checked the collinearity among predictors and they are all 
>>>>> < 0.5 (which I think is OK). Do you know what else could make this 
>>>>> errors?
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>> Hien Nguyen
>>>>>
>>>>> David Winsemius wrote:
>>>>> > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>>>> > > > Dear R-helpers,
>>>>> > > > > I am very new to R and trying to run the conditional logit 
>>>>> model using
>>>>> > > "clogit " command.
>>>>> > > I have more than 4000 observations in my dataset and try to 
>>>>> predict the
>>>>> > > dependent variable from 14 independent variables. My command 
>>>>> is as > > follows
>>>>> > > > > clmtest1 <-
>>>>> > > 
>>>>> clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) 
>>>>> > > > > > > However, it produces the following errors:
>>>>> > > > > Error in fitter(X, Y, strats, offset, init, control, 
>>>>> weights = weights, > > :
>>>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>>>> > > In addition: Warning messages:
>>>>> > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, 
>>>>> converted to > > NA
>>>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights = 
>>>>> weights, :
>>>>> > > Ran out of iterations and did not converge
>>>>> > > > > I search the error message from R forums but it does not 
>>>>> say anything
>>>>> > > for Conditional Logit Model.
>>>>> > > With that many predictors in a small dataset, you may have 
>>>>> created matrix > singularities. Perhaps you created a stratum 
>>>>> where all of the subjects > experience the event and others where 
>>>>> none did so. The coefficients might > be driven to infinities. Try 
>>>>> simplifying the model.
>>>>> > > > > > > Please check for me what it says and what should I do 
>>>>> to solve it.
>>>>> > > 
>>>>
>>>> David Winsemius, MD
>>>> Heritage Laboratories
>>>> West Hartford, CT
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> Charles C. Berry                            (858) 534-2098
>>>                                             Dept of 
>>> Family/Preventive Medicine
>>> E mailto:cberry at tajo.ucsd.edu                UC San Diego
>>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
>>> 92093-0901
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley            Assoc. Professor, Biostatistics
> tlumley at u.washington.edu    University of Washington, Seattle
>




More information about the R-help mailing list