[R] Grouped Logistic (Or conditional Logistic.)

David Winsemius dwinsemius at comcast.net
Thu Sep 17 20:49:27 CEST 2009


On Sep 17, 2009, at 2:06 PM, (Ted Harding) wrote:

> On 17-Sep-09 17:28:16, Noah Silverman wrote:
>> Hi,
>> I'm not sure of the correct nomenclature or function for what
>> I'm trying to do.
>>
>> I'm interested in calculated a logistic regression on a binary
>> dependent variable (True,False).
>>
>> There are a few ways to easily do this in R. Both SVM and GLM
>> work easily.
>>
>> The part that I want to add is "group wise" awareness.  So that
>> the algorithm computes the coefficients to maximize the liklihood
>> of of a "True" label per group.
>>
>> An toy explanation is probably best.  I've been looking at horse
>> racing models as a fun field to learn about statistics and R.
>>
>> So, for this example, lets assume the following:
>> 100 horses in our stable
>> 10 horses per race
>> 75 races this season (some horses race more than once.)
>>
>> The independent variables are things about a horse (average speed,
>> number of past wins, etc.)
>> The dependent variable is (Win, Lose) represented by (1,0)
>>
>> As mentioned above, an SVM or GLM will quickly work to estimate
>> coefficients and probability of a Win. I'd like to take it further
>> and estimate the probability of a win but look at the per race.
>>
>> I'm NOT interested in the group label as a final part of the model.
>> I don't want a separate set of coefficients for each group. I just
>> want the iterative algorithm to work toward maximizing the liklihood
>> PER GROUP as an average.
>>
>> I looked extensively through rseek.org for things like "grouped
>> logistic" and "nested logistic".  I couldn't seem to find anything
>> do this.  I'm probably naming it wrong.
>>
>> I assume that a MANUAL iteration concept would be to :
>>     1) Pick a coefficient
>>     2) Calculate the resulting probability for each horse.
>>     3) Measure the strength of the result for each race (sum them
>> together or average them?)
>>     4) Adjust coefficient and repeat
>>
>> Surely there must be some standard function in a library that will
>> do this.
>>
>> Can any of the stat gurus here offer some suggestions?
>>
>> Thanks!
>> --
>> Noah
>
> In the context of your "fun example", you have a fundamental problem
> in that (if I've understood your statement of it correctly) you will
> have more than one of your horses in the same race (apparently 10).
>
> Therefore, one of them winning excludes any of the others winning in
> that same race, so their results are not independent of each
> other.
>
> Also, at least in real life, the probability that a given horse will
> win in a particular race depends not only on the covariates "per  
> horse"
> (such as your average speed, number of past wins, etc.), and indeed
> on the condition of the race-course at the time, but also (and usually
> strongly) on the characteristics of the other horses in the same race.
>
> So a simple logistic model of the kind you seem to be proposing would
> certainly not be realistic!
>
> I would be happier thinking about your problem in the context of a
> different kind of example ...

Ted;

Would your set of concerns be addressed if the OP switched to a  
proportional
odds logistic regression framework? Harrell discusses such in his RMS  
text.

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list