[R] Unexpected behavior with weights in binomial glm()

Sun Sep 30 13:47:22 CEST 2012

Hi David,

Yes, I agree that the results are "very similar" but I don't
understand why they are not exactly equal given that the data sets are
identical.

And yes, this 1% numerical difference is hugely important to me.  I
have another data set (much larger than this toy example) that works
on the aggregated data (returning a coefficient of about 1) but
returns the warning about perfect separation on the non-aggregated
data (and a coefficient of about 1e15).  So, I'd at least like to be
able to understand where this numerical difference is coming from and,
preferably, a way to tweak my glm() runs (possibly adjusting the
numerical precision somehow???) so that this doesn't happen.

Josh

On Sat, Sep 29, 2012 at 7:50 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Sep 29, 2012, at 7:10 AM, Josh Browning wrote:
>
>> Hi useRs,
>>
>> I'm experiencing something quite weird with glm() and weights, and
>> maybe someone can explain what I'm doing wrong.  I have a dataset
>> where each row represents a single case, and I run
>> glm(...,family="binomial") and get my coefficients.  However, some of
>> my cases have the exact same values for predictor variables, so I
>> should be able to aggregate up my data frame and run glm(...,
>> family="binomial",weights=wts) and get the same coefficients (maybe
>> this is my incorrect assumption, but I can't see why it would be).
>> Anyways, here's a minimum working example below:
>>
>>> d = data.frame( RESP=c(rep(1,5),rep(0,5)), INDEP=c(1,1,1,1,0,0,0,0,0,0) )
>>> glm( RESP ~ INDEP, family="binomial", data=d )
>>
>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = d)
>>
>> Coefficients:
>> (Intercept)        INDEP
>>     -1.609       21.176
>>
>> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
>> Null Deviance:      13.86
>> Residual Deviance: 5.407        AIC: 9.407
>>> dAgg = aggregate( d$RESP, by=list(d$RESP, d$INDEP), FUN=length )
>>> colnames(dAgg) = c("RESP","INDEP","WT")
>>> glm( RESP ~ INDEP, family="binomial", data=dAgg, weights=WT )
>>
>> Call:  glm(formula = RESP ~ INDEP, family = "binomial", data = dAgg,
>>    weights = WT)
>>
>> Coefficients:
>> (Intercept)        INDEP
>>     -1.609       20.975
>>
>> Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
>> Null Deviance:      13.86
>> Residual Deviance: 5.407        AIC: 9.407
>
> Those two results look very similar and it is with a data situation that seems somewhat extreme. The concern is for the 1% numerical  difference in the regression coefficient? Am I reading you correctly?
>
> --
> David Winsemius, MD
> Alameda, CA, USA
>