[R] Error when running Conditional Logit Model

Hien Nguyen hunghien2 at gmail.com
Sat Dec 19 11:36:31 CET 2009


On 12/18/09 22:24, Charles C. Berry wrote:
> On Fri, 18 Dec 2009, Hien Nguyen wrote:
>
>> Thanks a lot for answering my questions.
>>
>> I have tried to run the clogit for only 64 observations and 4
>> independent variables and the results are solved instantly. However,
>> when I run the same command (with only 4 dependent variables) for the
>> full data, it keeps running for 50 minutes now. :(
>>
>> Thomas, what do you mean by "maximizing the unconditional likelihood
>> is fine when the stratum sizes are large"? What I put in "strata
>> (__)" is actually the possible choices (1-64). Each choices will be
>> recored more than 4000 times (which means I have more than 4000
>> values of 1, 4000 values of 2 and so on).
>> Does it sound right?
>
> So you have 64 cases and more than 250000 controls.
>

No, I have 4096 cases and more than 25000 controls. Each case will
result in 63 controls (which I have to create from each case)

> Large strata will really slow down clogit. But I think that that isn't
> your problem.
>
> If the strata really matter - in the sense that the conditional
> distributions of covariates for controls vary a lot from stratum to
> stratum - then you really gain little by having more than a handful of
> controls for each case. If that is the situation you are in, sampling
> a couple of dozen controls from the stratum of each case will give you
> results that are very nearly as precise as those obtained from using
> all 4000 of them:
>
>     plot( 1:100, (1 + 1/1:100), xlab='n of controls',
>         ylab='relative variance of coef' )
>
>
> will give you rough idea of the impact of increasing the number of
> controls per case. The variance with 1 control per case is 2; at the
> asymptote it is 1.
>
> So you can probably spend things up a lot by using fewer controls with
> little loss in accuracy.

I think I might need to use this.

>
> With only 64 cases you cannot fit terribly complicated models. This
> holds whether you approach things conditionally using clogit or
> unconditionally using glm. Fourteen degrees of freedom for regression
> is probably pushing matters.  ridge() is helpful in taming overlarge
> regressor sets in clogit, but you'll need to use
> survival:::summary.coxph.penal() on the result (or tinker with the
> class attribute).
>
I still let the program run. For the case of 4 df, it still does not
produce the result.

> BTW, when you say 'strata(___)', I hope you mean that you use
> something like 'strata( stratvar )' where stravar is a factor that
> encodes the 64 levels.
>

Yes, that's what I mean. Thank you.

> HTH,
>
> Chuck
>
>>
>> Thanks a lot
>>
>> Hien
>>
>> tlumley at u.washington.edu wrote:
>>>  On Fri, 18 Dec 2009, Hien Nguyen wrote:
>>>
>>> >  Dear Drs Winsemius and Berry,
>>> > >  Thanks a lot for your comment and suggestions on running my
>>> model. I am >  not just new to R but new to CLM as well. :( With
>>> your suggestions, I >  figure out that I have huge misunderstandings
>>> on the model and data >  arrangement.
>>> > >  After my finals, I have read again related materials on CLM and
>>> >  rearranged in an appropriate way before running the model in R.
>>> This >  time, I have a data of more than 250,000 observations
>>> (created from more >  than 4000 response) and a model of 15 predictors.
>>> > >  My question is that how long should it takes for the clogit
>>> command to >  run because it has been running for more 10 hours on a
>>> quad-core >  computer and still doesn't show any sign of done or
>>> almost done. Is it >  OK or my command just does not work.
>>>
>>>  If you have a lot of records with case=1 in a stratum, conditional
>>>  logistic regression will be extremely slow.   And unnecessary:
>>> maximizing
>>>  the unconditional likelihood is fine when the stratum sizes are large.
>>>
>>>  Note that a quad-core computer won't help. Only one core will be
>>> used in
>>>  the computations.
>>>
>>>       -thomas
>>>
>>>
>>>
>>>
>>> >  Thanks a lot for your response
>>> > >  Hien
>>> > > >  Charles C. Berry wrote:
>>> > >  On Fri, 4 Dec 2009, David Winsemius wrote:
>>> > > > > > > > >  On Dec 4, 2009, at 5:49 PM, Hien Nguyen wrote:
>>> > > > > > > >  Dear Dr. Winsemius,
>>> > > > > > > > >  Thank you very much for your reply.
>>> > > > > > > > >  I have tried many possible combinations (even with
>>> the model of > > > >  only 2 predictors) but it produces the same
>>> message. With more > > > >  than 4000 observations, I think 14
>>> predictors might not be too > > > >  many.
>>> > > > > > >  It is what happens in the factor combinations that
>>> concern me. I am > > >  guessing that some of those predictors are
>>> factors. You really > > >  should not ask r-help questions without
>>> providing better > > >  descriptions of both the outcomes and the
>>> predictor variables.
>>> > > > > > > > > > > >  Although my dependent variable (Pin) is not
>>> discrete  (it ranges > > > >  from 0 to 1), I do not think it will
>>> create problems to the > > > >  estimation but I'm not sure
>>> > > > > > >  I would think it _would_ cause problems. As I
>>> understand it, > > >  conditional methods create contingency tables.
>>> Why are you using an > > >  outcome type that is not consistent with
>>> the fundamental regression > > >  assumptions of the clogit function?
>>> > > > > > >  I do not get that particular error when I munge the
>>> infert dataset > > >  to have case be a random uniform value, but I
>>> do get an error.
>>> > > > >   infert$case <- runif(nrow(infert))
>>> > > > >   clogit(case~spontaneous+induced+strata(stratum),data=infert)
>>> > > >  Error in Surv(rep(1, 248L), case) : Invalid status value
>>> > > > > > > >  David, I think you were on the right track. I get this:
>>> > > > >  -----------
>>> > > > 
>>> clogit(I(case*runif(length(case)))~spontaneous+induced+strata(ifelse(stratum>40,NA,stratum)),data=infert)
>>> > > > >  Error in fitter(X, Y, strats, offset, init, control,
>>> weights = > >  weights,  :
>>> > >    NA/NaN/Inf in foreign function call (arg 6)
>>> > >  In addition: Warning messages:
>>> > >  1: In Surv(rep(1, 248L), I(case * runif(length(case)))) :
>>> > >    Invalid status value, converted to NA
>>> > >  2: In fitter(X, Y, strats, offset, init, control, weights =
>>> weights, > >  :
>>> > >    Ran out of iterations and did not converge
>>> > > > > >  ------------
>>> > > > >  which looks pretty much the same as Hien's error msg
>>> > > > >  So Hien needs to create a logical status value.
>>> > > > >  Chuck
>>> > > > >  p.s.
>>> > > > > >  sessionInfo()
>>> > >  R version 2.10.0 (2009-10-26)
>>> > >  i386-pc-mingw32
>>> > > > >  locale:
>>> > >  [1] LC_COLLATE=English_United States.1252
>>> > >  [2] LC_CTYPE=English_United States.1252
>>> > >  [3] LC_MONETARY=English_United States.1252
>>> > >  [4] LC_NUMERIC=C
>>> > >  [5] LC_TIME=English_United States.1252
>>> > > > >  attached base packages:
>>> > >  [1] splines   stats     graphics  grDevices utils     datasets
>>> > >  methods
>>> > >  [8] base
>>> > > > >  other attached packages:
>>> > >  [1] survival_2.35-7
>>> > > > >  loaded via a namespace (and not attached):
>>> > >  [1] tools_2.10.0
>>> > > > > > > > > > >  So I certainly would not have proceeded to
>>> submit a full analysis to > > >  clogit if I could not get a test
>>> case to run under the situation you > > >  propose.
>>> > > > > > >  -- > > >  David
>>> > > > > > > > > > > >  I have checked the collinearity among
>>> predictors and they are all > > > >  < 0.5 (which I think is OK). Do
>>> you know what else could make this > > > >  errors?
>>> > > > > > > > >  Thanks a lot
>>> > > > > > > > >  Hien Nguyen
>>> > > > > > > > >  David Winsemius wrote:
>>> > > > > > >  On Dec 4, 2009, at 9:22 AM, Hien Nguyen wrote:
>>> > > > > > > >  Dear R-helpers,
>>> > > > > > > > >  I am very new to R and trying to run the
>>> conditional logit > > > >  model using
>>> > > > > > >  "clogit " command.
>>> > > > > > >  I have more than 4000 observations in my dataset and
>>> try to > > > >  predict the
>>> > > > > > >  dependent variable from 14 independent variables. My
>>> command > > > >  is as > > follows
>>> > > > > > > > >  clmtest1 <-
>>> > > > > > > > > > > 
>>> clogit(Pin~Income+Bus+Pop+Urbpro+Health+Student+Grad+NE+NW+NCC+SCC+CH+SE+MRD+strata(IDD),data=clmdata)
>>> > > > > > > > > > >  However, it produces the following errors:
>>> > > > > > > > >  Error in fitter(X, Y, strats, offset, init,
>>> control, > > > >  weights = weights, > > :
>>> > > > > > >  NA/NaN/Inf in foreign function call (arg 6)
>>> > > > > > >  In addition: Warning messages:
>>> > > > > > >  1: In Surv(rep(1, 4096L), Pinmig) : Invalid status
>>> value, > > > >  converted to > > NA
>>> > > > > > >  2: In fitter(X, Y, strats, offset, init, control,
>>> weights = > > > >  weights, :
>>> > > > > > >  Ran out of iterations and did not converge
>>> > > > > > > > >  I search the error message from R forums but it
>>> does not > > > >  say anything
>>> > > > > > >  for Conditional Logit Model.
>>> > > > > > >  With that many predictors in a small dataset, you may
>>> have > > > >  created matrix > singularities. Perhaps you created a
>>> stratum > > > >  where all of the subjects > experience the event
>>> and others where > > > >  none did so. The coefficients might > be
>>> driven to infinities. Try > > > >  simplifying the model.
>>> > > > > > > > > > >  Please check for me what it says and what
>>> should I do > > > >  to solve it.
>>> > > > > > > > > > > > >  David Winsemius, MD
>>> > > >  Heritage Laboratories
>>> > > >  West Hartford, CT
>>> > > > > > >  ______________________________________________
>>> > > >  R-help at r-project.org mailing list
>>> > > >  https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > >  PLEASE do read the posting guide > > > 
>>> http://www.R-project.org/posting-guide.html
>>> > > >  and provide commented, minimal, self-contained, reproducible
>>> code.
>>> > > > > > > >  Charles C. Berry                            (858)
>>> 534-2098
>>> > >                                              Dept of
>>> Family/Preventive > >  Medicine
>>> > >  E mailto:cberry at tajo.ucsd.edu                UC San Diego
>>> > >  http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
>>> > >  92093-0901
>>> > > > > > >  ______________________________________________
>>> >  R-help at r-project.org mailing list
>>> >  https://stat.ethz.ch/mailman/listinfo/r-help
>>> >  PLEASE do read the posting guide > 
>>> http://www.R-project.org/posting-guide.html
>>> >  and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>  Thomas Lumley            Assoc. Professor, Biostatistics
>>>  tlumley at u.washington.edu    University of Washington, Seattle
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> Charles C. Berry                            (858) 534-2098
>                                             Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu                UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
> 92093-0901
>
>




More information about the R-help mailing list