[R] How to test a difference in ratios of count data in R

David Winsemius dwinsemius at comcast.net
Wed Sep 28 22:54:46 CEST 2016

```> On Sep 28, 2016, at 9:49 AM, Greg Snow <538280 at gmail.com> wrote:
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment\$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>

Greg has already shown you how that is done in R and how to do logistic regression:

#  I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples.

fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)

#  And the lme4 analogue with replication:

library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
[glmerMod]
Family: poisson  ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
Data: test

AIC      BIC   logLik deviance df.resid
31.9     31.3    -13.0     25.9        3

Scaled residuals:
Min      1Q  Median      3Q     Max
-1.0504 -0.4146 -0.3487  0.3956  1.0791

Random effects:
Groups    Name        Variance Std.Dev.
replicate (Intercept) 0.03159  0.1777
Number of obs: 6, groups:  replicate, 3

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7875     0.3372  -5.301 1.15e-07 ***
treatmentB    1.3365     0.3529   3.787 0.000152 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
(Intr)
treatmentB -0.838

Compare with the binomial model:
#============

fitBin <- glmer( cbind(X,n-X) ~ treatment +  (1|replicate), data=test,
family=binomial)
coef(fitBin)
#----
\$replicate
(Intercept) treatmentB
1  -2.0487694   2.364695
2  -0.9908556   2.364695
3  -2.1844435   2.364695

attr(,"class")
[1] "coef.mer"
#-----
summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace  Approximation)
[glmerMod]
Family: binomial  ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
Data: test

AIC      BIC   logLik deviance df.resid
30.1     29.4    -12.0     24.1        3

Scaled residuals:
Min       1Q   Median       3Q      Max
-0.88757 -0.35065 -0.03137  0.26897  0.67505

Random effects:
Groups    Name        Variance Std.Dev.
replicate (Intercept) 0.4123   0.6421
Number of obs: 6, groups:  replicate, 3

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.7442     0.5438  -3.208  0.00134 **
treatmentB    2.3647     0.4741   4.988 6.11e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
(Intr)
treatmentB -0.568

The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical)
#--------
fitNorm <- glmer( I(X/n) ~ treatment +  (1|replicate), data=test,
family=gaussian)

#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
> coef(fitNorm); summary(fitNorm)
\$replicate
(Intercept) treatmentB
1 0.091096925  0.4925325
2 0.324579602  0.4925325
3 0.009323473  0.4925325

attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
Data: test

REML criterion at convergence: -4.2

Scaled residuals:
Min      1Q  Median      3Q     Max
-0.7864 -0.4278 -0.1152  0.5143  0.8246

Random effects:
Groups    Name        Variance Std.Dev.
replicate (Intercept) 0.027895 0.16702
Residual              0.002356 0.04854
Number of obs: 6, groups:  replicate, 3

Fixed effects:
Estimate Std. Error t value
(Intercept)  0.14167    0.10042   1.411
treatmentB   0.49253    0.03963  12.427

Correlation of Fixed Effects:
(Intr)
treatmentB -0.197

That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct.

--
David.

>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> Joshua
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help