[R] Effect size of comparison of two levels of a factor in multiple linear regression

Christoph Mathys cmathys at bidmc.harvard.edu
Wed Feb 6 13:05:56 CET 2008


On Feb 4, 08:49 AM, Chuck Cleland wrote:
> On 2/3/2008 10:09 AM, Christoph Mathys wrote:
>> Dear R users,
>> I have a linear model of the kind
>> outcome ~ treatment + covariate
>> where 'treatment' is a factor with three levels ("0", "1", and "2"),
>> and the covariate is continuous. Treatments "1" and "2" both have
>> regression coefficients significantly different from 0 when using
>> treatment contrasts with treatment "0" as the baseline. I would now like
>> to determine effect sizes (akin to Cohen's d in a two-sample comparison)
>> for the comparison to baseline of treatments "1" and "2". I have
>> illustrated a way to do this in the reproducible example below and am
>> grateful for any comments on the soundness of what I'm doing. I have not
>> yet found a way to determine confidence intervals for the effect sizes
>> derived below and would appreciate tips on that.
>
>   How about scaling the outcome by the residual standard error from the 
> unstandardized model?  For example:

Yes, that also works. It's actually what my simulation does implicitly
and what I should have done in the first place. It has the added benefit
that a confidence interval can be given. Thanks.

>
> library(MASS)
>
> cmat <- diag(3)
> diag(cmat) <- c(25,25,25)
>
> df <- data.frame(Y = c(mvrnorm(100, mu=c(10,30,40), Sigma=cmat, 
> empirical=TRUE)), TX = factor(rep(c(0,1,2), each=100)))
>
> fm1 <- lm(Y ~ TX, data = df)
> fm2 <- lm(scale(Y, scale=summary(fm1)$sigma) ~ TX, data = df)
>
> summary(fm1)
>
> Call:
> lm(formula = Y ~ TX, data = df)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -12.7260  -3.5215  -0.1982   3.4517  12.1690
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  10.0000     0.5000   20.00   <2e-16 ***
> TX1          20.0000     0.7071   28.28   <2e-16 ***
> TX2          30.0000     0.7071   42.43   <2e-16 ***
> ---
> Signif. codes:  0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 
> ??? ??? 1
>
> Residual standard error: 5 on 297 degrees of freedom
> Multiple R-squared: 0.8627,     Adjusted R-squared: 0.8618
> F-statistic: 933.3 on 2 and 297 DF,  p-value: < 2.2e-16
>
> summary(fm2)
>
> Call:
> lm(formula = scale(Y, scale = summary(fm1)$sigma) ~ TX, data = df)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -2.54521 -0.70431 -0.03965  0.69034  2.43380
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -3.3333     0.1000  -33.33   <2e-16 ***
> TX1           4.0000     0.1414   28.28   <2e-16 ***
> TX2           6.0000     0.1414   42.43   <2e-16 ***
> ---
> Signif. codes:  0 ???***??? 0.001 ???**??? 0.01 ???*??? 0.05 ???.??? 0.1 
> ??? ??? 1
>
> Residual standard error: 1 on 297 degrees of freedom
> Multiple R-squared: 0.8627,     Adjusted R-squared: 0.8618
> F-statistic: 933.3 on 2 and 297 DF,  p-value: < 2.2e-16
>
> confint(fm2)
>                 2.5 %    97.5 %
> (Intercept) -3.530132 -3.136535
> TX1          3.721685  4.278315
> TX2          5.721685  6.278315
>
>   I've never seen this approach before, and I'm not how it would work when 
> the variances within groups are heterogeneous or one or more covariates are 
> added to the model.

Strictly speaking, the approach breaks down when we don't have
homogeneity of variances across groups. But luckily, I have it in the
real data I'm looking at.

When covariates are added, one has to decide case by case what to do
about them, namely which model's sigma to standardize against.
Standardizing against the sigma of a model that includes a given
covariate implies looking at effect sizes in a population where that
covariate has a fixed value. On the other hand, standardizing against the
sigma of a model that does not include a given covariate implies looking
at effect sizes in a population where that covariate varies.

> hope this helps,

Sure did. Thanks again,

Christoph

--
Christoph Mathys, M.S.
Music and Neuroimaging Laboratory
Beth Israel Deaconess Medical Center
and Harvard Medical School



More information about the R-help mailing list