[R] Very small random effect estimation in lmer but not in stata xtmixed

Fri May 4 05:38:17 CEST 2012

I have tried to replicate Mohammed's problem using synthetic data and
used the parameter estimates from his Stata fit for generating the
data. The data has the following notable features:

- sample size is rather small: 30 groups with 3 observations each
- residual variance is high relative to the group variance

The data generating model*:

yik=3.1+2.3*dum+uk+eik, with uk~N(0,1) and eik~N(0,3.5^2)

where -dum- is a binary predictor variable.

The Stata fit to these data then looks as follows (I only show the
random effects estimates):

. xtmixed yik dum || id:, reml

Mixed-effects REML regression                   Number of obs      =        90
Group variable: id                              Number of groups   =        30

<snip>

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |   .9330436   .7147523      .2078951    4.187546
-----------------------------+------------------------------------------------
                sd(Residual) |   3.269403   .2998043      2.731575    3.913124
------------------------------------------------------------------------------

We can see that Stata does a fairly good job in recovering the
variance parameters.

Fitting the model with -lmer- shows:

Linear mixed model fit by REML
Formula: yik ~ dum + (1 | id)
   Data: data3
   AIC   BIC logLik deviance REMLdev
 480.6 490.6 -236.3    473.5   472.6
Random effects:
 Groups   Name        Variance   Std.Dev.
 id       (Intercept) 1.1764e-09 3.4298e-05
 Residual             1.1542e+01 3.3973e+00
Number of obs: 90, groups: id, 30

from which we can see that the group level variance is estimated as
essentially zero, which is clearly quite a bit away from the truth.

So it seems that when
a) the sample size is small, and
b) residual variance is high relative to group variance

the -lmer- estimates can be quite off. If the variances are 1/1, both
Stata and lmer are producing the same results (with the N=90).
However, with variances of around 10/1 (residual/group variance),
sample size needs to improve significantly for lmer to recover the
correct parameters. With 300 groups and 3 observations each (i.e.
N=900), estimation of the group level variance was still off. With
3000 groups and 3 observations each (i.e. N=9000), estimation was fine
again.

So in conclusion, I would say that if one's data meets the above
conditions [a) and b)], lmer variance estimates should be used
cautiously and possibly be compared to results from other software
packages. Perhaps all this should also be inspected more
systematically than I did here. However, when unsure, it is always a
good idea to generate synthetic data that matches some key features of
one's real data but for which the true parameters are known, and then
see how it plays out.

Cheers,

Joerg

*Here is my Stata code that I used for generating the data:

*---Stata code-------------
	clear
	set seed 123
	set obs 30
	gen id=_n
	gen uk=rnormal()
	expand 3
	bys id: gen time=_n
	gen p=runiform()
	gen dum=cond(p<.5, 1, 0)
	gen eik=rnormal(0,3.5)
	gen yik=3.1+2.3*dum+uk+eik
*--------------------------

On Thu, May 3, 2012 at 5:37 PM, Joerg Luedicke <joerg.luedicke at gmail.com> wrote:
> I just realized that I have sent this only to Mohammed. So here it is
> to the list:
>
> These two model fits should yield the same results. In fact, if we use
> some simulated data generated by the model
>
> yik=2+0.5*xik1+0.25*xik2+uk+eik , with uk~N(0, 1) and eik~N(0, 1)
>
> and compare the results between Stata and R:
>
> . xtmixed y xik1 xik2 || id:, reml
>
> Mixed-effects REML regression                   Number of obs      =      5000
> Group variable: id                              Number of groups   =      1000
>
>                                               Obs per group: min =         5
>                                                              avg =       5.0
>                                                              max =         5
>
>
>                                               Wald chi2(2)       =   2083.89
> Log restricted-likelihood = -8013.0388          Prob > chi2        =    0.0000
>
> ------------------------------------------------------------------------------
>          y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       xik1 |   .4708663    .025753    18.28   0.000     .4203914    .5213411
>       xik2 |   .2726184   .0256147    10.64   0.000     .2224145    .3228222
>      _cons |   2.007218   .0354583    56.61   0.000     1.937721    2.076715
> ------------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
>  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
> -----------------------------+------------------------------------------------
> id: Identity                 |
>                  sd(_cons) |   1.028595    .027431      .9762119    1.083789
> -----------------------------+------------------------------------------------
>               sd(Residual) |   .9980663   .0111614      .9764285    1.020184
> ------------------------------------------------------------------------------
>
> Linear mixed model fit by REML
> Formula: y ~ xik1 + xik2 + (1 | id)
>  Data: data3
>  AIC   BIC logLik deviance REMLdev
>  16036 16069  -8013    16009   16026
> Random effects:
>  Groups   Name        Variance Std.Dev.
>  id       (Intercept) 1.05801  1.02859
>  Residual             0.99614  0.99807
> Number of obs: 5000, groups: id, 1000
>
> Fixed effects:
>           Estimate Std. Error t value
> (Intercept)  2.00722    0.03546   56.61
> xik1         0.47087    0.02575   18.28
> xik2         0.27262    0.02561   10.64
>
> Correlation of Fixed Effects:
>    (Intr) xik1
> xik1 -0.007
> xik2  0.005 -0.798
>
> we can see that the results are _exactly_ the same, both for fixed and
> random effects. The correlation of fixed effects, of which results are
> coming along when using the -summary- function in R are irrelevant for
> the model fit. In the above data, variables xik1 and xik2 are drawn
> from a multivariate normal distribution with r=0.8.
>
> Thomas, OPs model contains only varying intercepts so there are no
> correlations of random effects in his model.
>
> I don't know what went wrong. Mohammed, how are your variables
> measured? Maybe huge differences in scale or something like that are
> causing a problem in one of the packages. Perhaps try to center or
> standardize your independent variable and see if that helps.
>
> Also note that in general, postings that are concerning
> multilevel/mixed-effects models might have a better home over at the R
> mixed model forum
> (https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models).
>
> Joerg
>
> On Thu, May 3, 2012 at 4:50 AM, Mohammed Mohammed
> <M.A.MOHAMMED at bham.ac.uk> wrote:
>> Hi folks
>>
>> I am using the lmer function (in the lme4 library) to analyse some data where individuals are clustered into sets (using the SetID variable) with a single fixed effect (cc - 0 or 1). The lmer model and output is shown below.
>> Whilst the fixed effects are consistent with stata (using xtmixed, see below), the std dev of the random effect for SetID is very very small (3.5803e-05)compared to stata's (see below 1.002). Any ideas why this should be happening please....?
>>
>>
>> LMER MODEL
>>
>> summary(lmer(AnxietyScore ~ cc + (1|SetID), data=mydf))
>> Linear mixed model fit by REML
>> Formula: AnxietyScore ~ cc + (1 | SetID)
>>   Data: mydf
>>   AIC   BIC logLik deviance REMLdev
>> 493.4 503.4 -242.7    486.6   485.4
>> Random effects:
>> Groups   Name        Variance   Std.Dev.
>>  SetID    (Intercept) 1.2819e-09 3.5803e-05
>> Residual             1.3352e+01 3.6540e+00
>> Number of obs: 90, groups: SetID, 33
>>
>> Fixed effects:
>>            Estimate Std. Error t value
>> (Intercept)   3.1064     0.5330   5.828
>> cc            2.3122     0.7711   2.999
>>
>> Correlation of Fixed Effects:
>>   (Intr)
>> cc -0.691
>>
>>
>> STATA XTMIXED
>>
>> xtmixed anxietyscore cc || setid:, reml
>>
>> Mixed-effects REML regression                   Number of obs      =        90
>> Group variable: setid                           Number of groups   =        33
>> Log restricted-likelihood = -242.48259          Prob > chi2        =    0.0023
>> ------------------------------------------------------------------------------
>> anxietyscore |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>>          cc |   2.289007   .7492766     3.05   0.002     .8204519    3.757562
>>       _cons |   3.116074   .5464282     5.70   0.000     2.045094    4.187053
>> ------------------------------------------------------------------------------
>> ------------------------------------------------------------------------------
>>  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
>> -----------------------------+------------------------------------------------
>> setid: Identity              |
>>                   sd(_cons) |   1.002484    .797775      .2107137    4.769382
>> -----------------------------+------------------------------------------------
>>                sd(Residual) |   3.515888   .3281988      2.928045     4.22175
>> ------------------------------------------------------------------------------
>>
>>
>> with thanks & best wishes
>> M
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.