[R] Question on binomial data

Wed Apr 22 06:57:49 CEST 2009

David,

Faraway suggests using the Hosmer Lemeshow test in the case of a
binary response, and discusses the inadequacy of Wald statistics.
However, I'm not sure it applies here due to the limited number of
cases.

Thanks, Ehud.

On Wed, Apr 22, 2009 at 2:04 AM, David Winsemius <dwinsemius at comcast.net> wrote:
> Surely Faraway does not suggest using the Wald statistic in preference to
> the deviance?
>
> Even if the distribution of deviance is not exactly chi-square, it appears
> generally accepted that a comparison of the difference in deviance to the
> chi-square statistic is better than using the ratio of the beta to se(beta)
> which is what that "Pr(>|z|)" number is.
>
> Your permutation results look sensible and could conceivably be considered
> the gold standard.
>
> --
> David
>
>
> On Apr 21, 2009, at 5:31 PM, ehud cohen wrote:
>
>> I thought of testing the difference in deviance between the null model
>> and the fitted model, assuming it is distributed as chi-sq. However,
>> Faraway writes that if the outcome is binary, the deviance
>> distribution is far from chisq.
>> I've done a permutation test:
>>
>> N<-5000; # Towards the upper limit, as there are only 17 over 5 =
>> 6,188 combination of the T/F data I have..
>> dev<-rep(0,N);
>> for (i in 1:N) {
>>        l1<-glm(sample(p)~w,family=binomial);
>>        dev[i]<-l1$dev;
>> }
>> print(mean(dev<l$dev))
>>
>> and the outcome is 0.005 - which is close to the ttest.
>>
>> I've repeated the same with calculating the statistics on the z-value
>> in summary(l1) each time instead of the deviance, and got a comparable
>> result.
>>
>> I think it means that David is right, the Pr(>|z|) in glm output does
>> not mean much. I still don't know what does it mean.
>>
>> Regarding your suggestion of using car's Anova:
>>
>>> Anova(l)
>>
>> Anova Table (Type II tests)
>>
>> Response: p
>>  LR Chisq Df Pr(>Chisq)
>> w   9.4008  1   0.002169 **
>>
>> which is identical to:
>>
>> pchisq(l$null.deviance-l$dev,1,lower=F)
>>
>> which seems to be too low - which is probably due to the binary response.
>>
>> would you think the permutation method is appropriate to use in this
>> case? and extended also to a case with several covariates?
>>
>>
>>
>> On Tue, Apr 21, 2009 at 10:34 PM,  <markleeds at verizon.net> wrote:
>>>
>>> hi: i would wait for one of the guRus to say something but my take ( take
>>> it
>>> with a grain of salt ) is that the results
>>> are not so contradictory. the test of the significance of the coefficient
>>> in
>>> the GLM is 0.06. and the test that the
>>> means are difference gives a pv-pvalue of 0.004.  a couple of reasons why
>>> this might not be so contradictory:
>>>
>>> A) the test gives greater significance in the t-test case but it's not
>>> really testing the same thing. the t-test is only testing that
>>> the means are different. the glm is testing is that log odds of the
>>>  means
>>> of the two events ( pass and fail ) are linearly related to
>>> a covariate.
>>>
>>> b) your t-test is a little weird because it's only got  sample of five in
>>> one of the 2 samples and I'm not clear on whether it's assuming equal
>>> variances and then pooling ( I think there's a pooled = TRUE option for
>>> t.test  but I don't know the default value ).
>>> definitely that's not a large sample size regardless of the pooling
>>> issue.
>>>
>>> c) when you test the significance in a glm you need to compare the
>>> deviance
>>> of the model to the deviance of the nested null model.
>>> John Fox's book desacribes this but I don't think it's the same as
>>> looking
>>> as the significance in the table output of glm. that's
>>> a wald test and not the same as the deviance comparison ( essentially a
>>> likelihood ratio test i think ). with small sample sizes, i think these
>>> differences between these various test can be large. check out john fox's
>>> text for a nice description of testing in the generalized linear model
>>> framework. you can use Anova from his car package to do this.
>>>
>>> hopefully someone else wil say something though because i'd be curious to
>>> see where i'm wrong/right or something new.
>>> good luck.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Apr 21, 2009, ehud cohen <ehudco.list at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We have an experiment with pass/fail outcome, and a continuous
>>> parameter which may contribute to the outcome.
>>>
>>> First, we've analyzed it by:
>>>
>>> p=c(F,T,F,F,F,T,T,T,T,T,T,T,F,T,T,T,T);
>>> w=c(53,67,59,59,53,89,72,56,65,63,62,58,59,72,61,68,63);
>>> l<-glm(p~w,family=binomial)
>>> summary(l)
>>>
>>> Which turned out to be non significant.
>>>
>>> Then, we thought of comparing the parameters of the two groups (passed
>>> vs. failed)
>>>
>>> t.test(w[which(p)],w[which(!p)],alternative="two.sided")
>>>
>>> which turned highly significant.
>>>
>>> I'd appreciate some insight...
>>>
>>> Thanks, Ehud.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>