[R] Effect size of comparison of two levels of a factor in multiple linear regression

Sun Feb 3 16:09:50 CET 2008

Dear R users,

I have a linear model of the kind

outcome ~ treatment + covariate

where 'treatment' is a factor with three levels ("0", "1", and "2"),
and the covariate is continuous. Treatments "1" and "2" both have
regression coefficients significantly different from 0 when using
treatment contrasts with treatment "0" as the baseline. I would now like
to determine effect sizes (akin to Cohen's d in a two-sample comparison)
for the comparison to baseline of treatments "1" and "2". I have
illustrated a way to do this in the reproducible example below and am
grateful for any comments on the soundness of what I'm doing. I have not
yet found a way to determine confidence intervals for the effect sizes
derived below and would appreciate tips on that.

set.seed(123456) # Make session reproducible

# Set up the treatment factor with three levels and 100 observations
# each
treatment <- factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))

# Simulate outcomes
outcome <- rep(NA, 300)
outcome[treatment==0] <- rnorm(100, 10, 5) # baseline: mean=10, sd=5
outcome[treatment==1] <- rnorm(100, 30, 5) # effect size 4
outcome[treatment==2] <- rnorm(100, 40, 5) # effect size 6

# Check effect sizes (Cohen's d)
cohens.d <- function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) }
cohens.d(outcome[treatment==1], outcome[treatment==0])
[1] 3.984774
cohens.d(outcome[treatment==2], outcome[treatment==0])
[1] 6.167798

# Sometimes standardized regression coefficients are recommended
# for determining effect size but that clearly doesn't work here:
coef(lm(scale(outcome) ~ treatment))
(Intercept)  treatment1  treatment2
  -1.233366    1.453152    2.246946
# The reason it doesn't work is that the difference of outcome
# means is divided by the sd of *all* outcomes:
(mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome)
[1] 1.453152
(mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome)
[1] 2.246946

# Now, create a situation where Cohen's d is impossible to
# calculate directly by introducing a continuous covariate:
covariate <- 1:300
outcome <- outcome + rnorm(300, covariate, 2)
model <- lm(scale(outcome) ~ treatment + scale(covariate))
coef(model)
     (Intercept)       treatment1       treatment2 scale(covariate)
      -0.1720456        0.1994251        0.3167116        0.8753761

# Proposed way to determine effect size: simulate outcomes for each
# treatment level assuming the covariate to have a fixed value (here
# its mean value after standardization: zero)
library(MCMCpack)
no.of.sims <- 10000
sims.model <- MCMCregress(model, mcmc=no.of.sims)
sims.model[1:2,]
     (Intercept) treatment1 treatment2 scale(covariate)      sigma2
[1,]  -0.1780735  0.2024111  0.3395233        0.8682119 0.002617449
[2,]  -0.1521623  0.1773623  0.2956053        0.8764573 0.003529013
sims.treat0 <- rnorm(no.of.sims, sims.model[,"(Intercept)"], sqrt(sims.model[,"sigma2"]))
sims.treat1 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment1"], sqrt(sims.model[,"sigma2"]))
sims.treat2 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] + sims.model[,"treatment2"], sqrt(sims.model[,"sigma2"]))

# Calculate Cohen's d for simulated values
cohens.d(sims.treat1, sims.treat0)
[1] 3.683093
cohens.d(sims.treat2, sims.treat0)
[1] 5.782622

These values are reasonably close to the ones (4 and 6) I plugged in at
the beginning. It would be even nicer to have a confidence interval for
them, but if I bootstrap one out of the simulated outcomes its width
depends on the number of simulations and is therefore arbitrary. If
anyone knew a better way to get at the effect sizes I'm looking for or
how I could also get confidence intervals for them, that would be
greatly appreciated.

Thanks,

Christoph

--
Christoph Mathys, M.S.
Music and Neuroimaging Laboratory
Beth Israel Deaconess Medical Center
and Harvard Medical School