[R] Define a glm object with user-defined coefficients (logistic regression, family="binomial")

Sun Nov 14 16:36:31 CET 2010

On Nov 14, 2010, at 8:35 AM, Jürgen Biedermann wrote:

> Hi David,
>
> Thank you very much for your answer. It helps me a lot. The offset  
> argument was the key (I didn't understand the description in the R- 
> help file)
> Rereading my email I found a mistake in the definition of my  
> formula. Instead of p(y) = exp(a + c1*x1 + c2*x2), it has to be:  
> p(y) = exp(a + c1*x1 + c2*x2)/(1+exp(a + c1*x1 + c2*x2)), but  
> actually that doesn't matter much in our case.

Oh, but I think it does matter (at least in the situation where  
p(event) are above 1/20 or so. At lower probabilities it wouldn't  
matter very much.  You are posting above the default logistic equation  
used when the distribution is specified as "binomial" which I meant to  
include but forgot. Had I remembered to include the ...,  
family="binomial" it would have implied what you write above. So:

firstfit <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm,  
family="binomial")

# Is the model you cite above and I meant to write:

forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm,  
family="binomial")

(I think the only reason this went unremarked by the usually eagle- 
eyed rhelp viewership is the weekend has a decreased viewing activity.)

>
>> The anova results would have not much interpretability in this  
>> setting. You would be testing for the Intercept being zero under  
>> very artificial conditions. You have eliminated much statistical  
>> meaning by forcing the form of the results.
>
> Imagine the following. I develop a model on one dataset and want to  
> validate it on another. So I could use the coefficents trained on  
> the first dataset to define a glm model (named: ModelV) on the  
> second dataset. Then i could test this model against a NULL model  
> (named: ModelV0) of the second dataset with anova(ModelV, ModelV0,  
> test="Chisq").

Assuming this is a split dataset from run 1 in a split datast,  why  
not develop a score from the the first set and run it against a model  
that includes the score and the original variables.

Model 1: mod1 <- glm( c(event,no_event) ~ 1 + x1 +x2, data=dfrm1)
>> dfrm2$scr1 <- with(dfrm2, Intercept + c1*x1 + c2*x2))

Model2: mod2 <- glm( c(event,no_event) ~ 0 + x1 +x2 +scr1, data=dfrm2)

(In this manner impact from the deviations induced by varying X values  
would be assessed as evidence of potential model mis-specification.  
This would be most successful with larger datasets. It would probably  
be a better use of a smaller dataset, however, to get a bootstrapped  
validation of a model. Or to do a one-tenth holdout and do the  
validations run ten times.  Harrells' "Regression Modeling Strategies"  
gives a good discussion of validation strategies and his rms amd Hmisc  
packages provide the functions to perform the validation and display  
results neatly.)

>
> Best Wishes
> Jürgen
>
> -- 
> -----------------------------------
> Jürgen Biedermann
> Blücherstraße 56
> 10961 Berlin-Kreuzberg
> Mobil: +49 176 247 54 354
> Home: +49 30 250 11 713
> e-mail: juergen.biedermann at gmail.com
>
>
> --------- Korrespondenz ----------
>
> Betreff: Re: [R] Define a glm object with user-defined coefficients  
> (logistic regression, family="binomial")
> Von: David Winsemius <dwinsemius at comcast.net>
> An: Jürgen Biedermann <juergen.biedermann at googlemail.com>
> Datum: 13.11.2010 17:15
>>
>> On Nov 13, 2010, at 7:43 AM, Jürgen Biedermann wrote:
>>
>>> Hi there,
>>>
>>> I just don't find the solution on the following problem. :(
>>>
>>> Suppose I have a dataframe with two predictor variables (x1,x2)  
>>> and one depend binary variable (y). How is it possible to define a  
>>> glm object (family="binomial") with a user defined logistic  
>>> function like p(y) = exp(a + c1*x1 + c2*x2) where c1,c2 are the  
>>> coefficents which I define. So I would like to do no fitting of  
>>> the coefficients. Still, I would like to define a GLM object  
>>> because I could then easily use other functions which need a glm  
>>> object as argument (e.g. I could use the anova,
>>
>> The anova results would have not much interpretability in this  
>> setting. You would be testing for the Intercept being zero under  
>> very artificial conditions. You have eliminated much statistical  
>> meaning by forcing the form of the results.
>>
>>> summary functions).
>>
>> # Assume dataframe name is dfrm with variables event, no_event, x1,  
>> x2, and further assume c1 and c2 are also defined:
>>
>> dfrm$logoff <- with(dfrm, log(c1*x1 + c2*x2))
>> forcedfit <- glm( c(event,no_event) ~ 1 + offset(logoff), data=dfrm)
>>
>> (Obviously untested.)
>>
>>>
>>> Thank you very much! Greetings
>>> Jürgen
>>>
>>> -- 
>>> -----------------------------------
>>> Jürgen Biedermann
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>

David Winsemius, MD
West Hartford, CT