[R] Error when running Conditional Logit Model

tlumley at u.washington.edu tlumley at u.washington.edu
Fri Dec 18 20:53:39 CET 2009


On Fri, 18 Dec 2009, Hien Nguyen wrote:

> Dear Drs Winsemius and Berry,
>
> Thanks a lot for your comment and suggestions on running my model. I am not 
> just new to R but new to CLM as well. :( With your suggestions, I figure out 
> that I have huge misunderstandings on the model and data arrangement.
>
> After my finals, I have read again related materials on CLM and rearranged in 
> an appropriate way before running the model in R. This time, I have a data of 
> more than 250,000 observations (created from more than 4000 response) and a 
> model of 15 predictors.
>
> My question is that how long should it takes for the clogit command to run 
> because it has been running for more 10 hours on a quad-core computer and 
> still doesn't show any sign of done or almost done. Is it OK or my command 
> just does not work.

If you have a lot of records with case=1 in a stratum, conditional logistic regression will be extremely slow.   And unnecessary: maximizing the unconditional likelihood is fine when the stratum sizes are large.

Note that a quad-core computer won't help. Only one core will be used in the computations.

      -thomas




> Thanks a lot for your response
>
> Hien
>
>
> Charles C. Berry wrote:
>> On Fri, 4 Dec 2009, David Winsemius wrote:
>> 
>>> 
>>> On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>> 
>>>> Dear Dr. Winsemius,
>>>> 
>>>> Thank you very much for your reply.
>>>> 
>>>> I have tried many possible combinations (even with the model of only 2 
>>>> predictors) but it produces the same message. With more than 4000 
>>>> observations, I think 14 predictors might not be too many.
>>> 
>>> It is what happens in the factor combinations that concern me. I am 
>>> guessing that some of those predictors are factors. You really should not 
>>> ask r-help questions without providing better descriptions of both the 
>>> outcomes and the predictor variables.
>>> 
>>>> 
>>>> Although my dependent variable (Pin) is not discrete  (it ranges from 0 
>>>> to 1), I do not think it will create problems to the estimation but I'm 
>>>> not sure
>>> 
>>> I would think it _would_ cause problems. As I understand it, conditional 
>>> methods create contingency tables. Why are you using an outcome type that 
>>> is not consistent with the fundamental regression assumptions of the 
>>> clogit function?
>>> 
>>> I do not get that particular error when I munge the infert dataset to have 
>>> case be a random uniform value, but I do get an error.
>>>>  infert$case <- runif(nrow(infert))
>>>>  clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>> Error in Surv(rep(1, 248L), case) : Invalid status value
>>> 
>> 
>> David, I think you were on the right track. I get this:
>> 
>> -----------
>>> clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert) 
>> Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
>>   NA/NaN/Inf in foreign function call (arg 6)
>> In addition: Warning messages:
>> 1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>   Invalid status value, converted to NA
>> 2: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>>   Ran out of iterations and did not converge
>>> 
>> ------------
>> 
>> which looks pretty much the same as Hien's error msg
>> 
>> So Hien needs to create a logical status value.
>> 
>> Chuck
>> 
>> p.s.
>> 
>>> sessionInfo()
>> R version 2.10.0 (2009-10-26)
>> i386-pc-mingw32
>> 
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>> 
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>> [8] base
>> 
>> other attached packages:
>> [1] survival_2.35-7
>> 
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.0
>>> 
>> 
>> 
>>> So I certainly would not have proceeded to submit a full analysis to 
>>> clogit if I could not get a test case to run under the situation you 
>>> propose.
>>> 
>>> -- 
>>> David
>>> 
>>>> 
>>>> I have checked the collinearity among predictors and they are all < 0.5 
>>>> (which I think is OK). Do you know what else could make this errors?
>>>> 
>>>> Thanks a lot
>>>> 
>>>> Hien Nguyen
>>>> 
>>>> David Winsemius wrote:
>>>> > > On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>>> > > > Dear R-helpers,
>>>> > > > > I am very new to R and trying to run the conditional logit model 
>>>> using
>>>> > > "clogit " command.
>>>> > > I have more than 4000 observations in my dataset and try to predict 
>>>> the
>>>> > > dependent variable from 14 independent variables. My command is as > 
>>>> > follows
>>>> > > > > clmtest1 <-
>>>> > > 
>>>> clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata) 
>>>> > > > > > > However, it produces the following errors:
>>>> > > > > Error in fitter(X, Y, strats, offset, init, control, weights = 
>>>> weights, > > :
>>>> > > NA/NaN/Inf in foreign function call (arg 6)
>>>> > > In addition: Warning messages:
>>>> > > 1: In Surv(rep(1, 4096L), Pinmig) : Invalid status value, converted 
>>>> to > > NA
>>>> > > 2: In fitter(X, Y, strats, offset, init, control, weights = weights, 
>>>> :
>>>> > > Ran out of iterations and did not converge
>>>> > > > > I search the error message from R forums but it does not say 
>>>> anything
>>>> > > for Conditional Logit Model.
>>>> > > With that many predictors in a small dataset, you may have created 
>>>> matrix > singularities. Perhaps you created a stratum where all of the 
>>>> subjects > experience the event and others where none did so. The 
>>>> coefficients might > be driven to infinities. Try simplifying the model.
>>>> > > > > > > Please check for me what it says and what should I do to 
>>>> solve it.
>>>> > > 
>>> 
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> Charles C. Berry                            (858) 534-2098
>>                                             Dept of Family/Preventive 
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu                UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>> 
>> 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list