[R] the effect of blocking on the size of confidence intervals - analysis using lme and lmer

Mon Mar 16 23:24:05 CET 2009

This is a follow-up mail of "the effect of blocking on the size of
confidence intervals - analysis using aov".

In both mails I pursue the idea of using blocking factors in order to
reduce the width of confidence intervals.

My dataset comprises,
a quantitative response variable, namely: "response", and
three categorical eplanatory variables, namely: "method", "target", and
"query".
The method variable has 4 levels, namely:  XXX.pg.trained,
XXX.rpg.trained, MACCS, and FTREES.
The target variable has 28 levels.
The query variable has 84 levels.
For every target three queries are selected. Thus "query" is nested in
"target".
The design is completely balanced.

"response" "method" "target" "query"
0.68   "XXX.pg.trained" "06243" "06243:218379"
0.96   "XXX.pg.trained" "06243" "06243:218384"
0.72   "XXX.pg.trained" "06243" "06243:246192"
0.7696 "XXX.pg.trained" "06245" "06245:235027"
...

For the complete dataset see the attached file data.txt

Using lm and lme, I analyze three models:
> m0.lm  = lm ( response ~ method -1, data = d)
> m1.lme = lme( response ~ method -1, random = ~ 1 | target, data = d)
> m2.lme = lme( response ~ method -1, random = ~ 1 | query , data = d)

Using lmer the calls are:
> m1.lmer = lmer( response ~ method -1 + ( 1 | target ), data = d)
...

> summary.lm( m0.lm )

...
                      Estimate Std. Error t value Pr(>|t|)
methodFTREES           0.34163    0.03221   10.61   <2e-16 ***
methodMACCS            0.45525    0.03221   14.13   <2e-16 ***
methodXXX.pg.trained   0.49498    0.03221   15.37   <2e-16 ***
methodXXX.rpg.trained  0.45049    0.03221   13.99   <2e-16 ***
...
Residual standard error: 0.2952 on 332 degrees of freedom
...

> summary( m1.lme )

...
                         Value  Std.Error  DF   t-value p-value
methodFTREES          0.3416262 0.04871665 305  7.012514       0
methodMACCS           0.4552452 0.04871665 305  9.344756       0
methodXXX.pg.trained  0.4949833 0.04871665 305 10.160455       0
methodXXX.rpg.trained 0.4504917 0.04871665 305  9.247180       0
...

> summary( m2.lme )

...
                          Value  Std.Error  DF  t-value p-value
methodFTREES          0.3416262 0.03220994 249 10.60624       0
methodMACCS           0.4552452 0.03220994 249 14.13369       0
methodXXX.pg.trained  0.4949833 0.03220994 249 15.36741       0
methodXXX.rpg.trained 0.4504917 0.03220994 249 13.98611       0
...

Thus the confidence interval for "methodFTREES" is given by:

0.34163   +- qt( 0.025, 332 )*0.03221    = ( 0.2782686, 0.4049914 )  # m0.lm
0.3416262 +- qt( 0.025, 305 )*0.04871665 = ( 0.2457629, 0.4374895 )  #
m1.lme
0.3416262 +- qt( 0.025, 249 )*0.03220994 = ( 0.2781875, 0.4050649 )  #
m2.lme

Thus the interval for m1.lme is larger than the interval for m0.lm which
contradicts the utility of blocking...

Using lmer instead of lme, identical results can be derived.
Note that the intervals for lmer can alternatively be derived by:

> intervals( m1.lme )

...
methodFTREES          0.2457629 0.3416262 0.4374895
methodMACCS           0.3593820 0.4552452 0.5511085
methodXXX.pg.trained  0.3991201 0.4949833 0.5908466
methodXXX.rpg.trained 0.3546284 0.4504917 0.5463549
...

I have the following questions:

- How are the  std. errors of lme calculated?
It seems like the std errors of lme/lmer are more conservative than the
std errors reported by lm. This would explain for the increase of the
std error (and hence the size of the confidence interval) from m0.lm to
m1.lme.
- Is its possible to use lme without a random term, or alternatively
with a trivial random term?
- Is the overall approach: use of blocking factors to reduce the
residual varaince, and hence to decrease the width of the confidence
intervals valid?

Using lmer, alternatively to the calculation of CIs, credible intervals
can be calculated:

> m1.sample = mcmcsamp( m1.lmer, n =10000 )
> m1.ci = apply( m1.sample, 2, function(x)quantile(x, prob=c(.025,0.975)))
> m1.ci[,"methodFTREES"]
     2.5%     97.5%
0.2405115 0.4424920

...
> m2.ci[,"methodFTREES"]
     2.5%     97.5%
0.2767626 0.4042516

For my intended use: the comparison of the average performance of
different methods credible intervals a probably as good as CI. However,
how do I calculate credible intervals for m0.lm???

Thanks for your help,

Lars

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: data.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090316/0697521e/attachment-0003.txt>