[R] Unexpected behavior with weights in binomial glm()

Sun Sep 30 16:11:03 CEST 2012

I haven't followed this thread closely, but if perfect separation in a
binomial glm is the problem, google it. e.g.

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/complete_separation_logit_models.htm

This presumably explains your concerns about coefficient agreement.

-- Bert

On Sun, Sep 30, 2012 at 4:47 AM, Josh Browning
<rockclimber112358 at gmail.com> wrote:
> Hi David,
>
> Yes, I agree that the results are "very similar" but I don't
> understand why they are not exactly equal given that the data sets are
> identical.
>
> And yes, this 1% numerical difference is hugely important to me.  I
> have another data set (much larger than this toy example) that works
> on the aggregated data (returning a coefficient of about 1) but
> returns the warning about perfect separation on the non-aggregated
> data (and a coefficient of about 1e15).  So, I'd at least like to be
> able to understand where this numerical difference is coming from and,
> preferably, a way to tweak my glm() runs (possibly adjusting the
> numerical precision somehow???) so that this doesn't happen.
>
> Josh
>
> On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Sep 29, 2012, at 7:10 AM, Josh Browning wrote:
>>
>>> Hi useRs,
>>>
>>> I'm experiencing something quite weird with glm() and weights, and
>>> maybe someone can explain what I'm doing wrong.  I have a dataset
>>> where each row represents a single case, and I run
>>> glm(...,family="binomial") and get my coefficients.  However, some of
>>> my cases have the exact same values for predictor variables, so I
>>> should be able to aggregate up my data frame and run glm(...,
>>> family="binomial",weights=wts) and get the same coefficients (maybe
>>> this is my incorrect assumption, but I can't see why it would be).
>>> Anyways, here's a minimum working example below:
>>>
>>>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) )
>>>> glm( RESP ~ INDEP, family="binomial", data=d )
>>>
>>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = d)
>>>
>>> Coefficients:
>>> (Intercept)        INDEP
>>>     -1.609       21.176
>>>
>>> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
>>> Null Deviance:      13.86
>>> Residual Deviance: 5.407        AIC: 9.407
>>>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length )
>>>> colnames(dAgg) = c("RESP","INDEP","WT")
>>>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT )
>>>
>>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg,
>>>    weights = WT)
>>>
>>> Coefficients:
>>> (Intercept)        INDEP
>>>     -1.609       20.975
>>>
>>> Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
>>> Null Deviance:      13.86
>>> Residual Deviance: 5.407        AIC: 9.407
>>
>> Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical  difference in the regression coefficient? Am I reading you correctly?
>>
>> --
>> David Winsemius, MD
>> Alameda, CA, USA
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm