[R] Linear model vs Mixed model

Wed Jul 13 19:26:52 CEST 2016

Utkarsh:  I think the differences between the lm and lmer estimates of the
intercept are consistent with the regularization effect expected with
mixed-effects models where the estimates shrink towards the mean slightly.
I don't think there is any reason to expect exact agreement between the lm
and lmer estimates.  I didn't mean to imply that the different
parameterization of the contrasts would make the lm estimates agree more
with the lmer estimates, only that it might be easier to compare the
regression summary output to see how similar/dissimilar they were.  But
your way of comparing them are adequate.  Why would you think they should
agree exactly?  Larger sample sizes might make them closer (but I don't
know your n) but my expectation is that they could only be close not
exactly in agreement.  The lmer intercept estimates are based on using a
normal distribution with an estimated variance to locate the subject
specific intercepts.  Why do you think those modeled intercepts should
exactly coincide with your fixed effects intercepts that makes no such
distributional assumption?

Brian

Brian S. Cade, PhD

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO  80526-8818

email:  cadeb at usgs.gov <brian_cade at usgs.gov>
tel:  970 226-9326

On Wed, Jul 13, 2016 at 10:06 AM, Utkarsh Singhal <utkarsh.iit at gmail.com>
wrote:

> Hi Brian,
>
> This makes some sense to me theoretically, but doesn't pan out with my
> experiment.
>
> The contrasts default was the following as you said:
> > options("contrasts")
> $contrasts
>         unordered           ordered
> "contr.treatment"      "contr.poly"
>
> I changed it as follows:
> > options(contracts=c(unordered="contr.sum", ordered="contr.poly"))
>
> But the output of 'lm' model was exactly the same as before.
>
> Then I tried the following:
> > coef(lm(Reaction ~ Days + Subject, sleepstudy,
> contrasts=list(Subject="contr.sum")))
>
> The model output appears slightly different, but the 'Subject* +
> Intercept' is still exactly the same as the default 'lm' model.
>
> Also, as you can verify below, the predictions from two 'lm' models are
> exactly the  same
>
> > fit_lm = lm(Reaction ~ Days + Subject, sleepstudy);
> > fit_lm2 = lm(Reaction ~ Days + Subject, sleepstudy,
> contrasts=list(Subject="contr.sum"))
> >
> > head(predict(fit_lm))
>        1        2        3        4        5        6
> 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
>
> > head(predict(fit_lm2))
>        1        2        3        4        5        6
> 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
>
> And these are quite different from the mixedmodel:
>
> > fit_lmer = lmer(Reaction ~ Days + (1| Subject), sleepstudy)
> > head(predict(fit_lmer))
>        1        2        3        4        5        6
> 292.1888 302.6561 313.1234 323.5907 334.0580 344.5252
>
> Any idea?
>
>
>
> Utkarsh Singhal
> 91.96508.54333
>
>
> On 13 July 2016 at 00:18, Cade, Brian <cadeb at usgs.gov> wrote:
>
>> Your lm() estimates are using the default contrasts of contr.treatment,
>> providing an intercept corresponding to your subject 308 and the other
>> subject* estimates are differences from subject 308 intercept.  You could
>> have specified this with contrasts as contr.sum and the estimates would be
>> more easily compared to the lmer() model estimates.  They are close but
>> will never be identical as the lmer() model estimates are based on assuming
>> a normal distribution with specified variance.  They rarely would be
>> identical.
>>
>> Brian
>>
>> Brian S. Cade, PhD
>>
>> U. S. Geological Survey
>> Fort Collins Science Center
>> 2150 Centre Ave., Bldg. C
>> Fort Collins, CO  80526-8818
>>
>> email:  cadeb at usgs.gov <brian_cade at usgs.gov>
>> tel:  970 226-9326
>>
>>
>> On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com>
>> wrote:
>>
>>> Hello Thierry,
>>>
>>> Thank you for your quick response. Sorry, but I am not sure if I follow
>>> what you said. I get the following outputs from the two models:
>>> > coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
>>> Subject    (Intercept)     Days
>>> 308    292.1888 10.46729
>>> 309    173.5556 10.46729
>>> 310    188.2965 10.46729
>>> 330    255.8115 10.46729
>>> 331    261.6213 10.46729
>>> 332    259.6263 10.46729
>>> 333    267.9056 10.46729
>>> 334    248.4081 10.46729
>>> 335    206.1230 10.46729
>>> 337    323.5878 10.46729
>>> 349    230.2089 10.46729
>>> 350    265.5165 10.46729
>>> 351    243.5429 10.46729
>>> 352    287.7835 10.46729
>>> 369    258.4415 10.46729
>>> 370    245.0424 10.46729
>>> 371    248.1108 10.46729
>>> 372    269.5209 10.46729
>>>
>>> > coef(lm(Reaction ~ Days + Subject, sleepstudy))
>>> (Intercept)  295.03104
>>> Days          10.46729
>>> Subject309  -126.90085
>>> Subject310  -111.13256
>>> Subject330   -38.91241
>>> Subject331   -32.69778
>>> Subject332   -34.83176
>>> Subject333   -25.97552
>>> Subject334   -46.83178
>>> Subject335   -92.06379
>>> Subject337    33.58718
>>> Subject349   -66.29936
>>> Subject350   -28.53115
>>> Subject351   -52.03608
>>> Subject352    -4.71229
>>> Subject369   -36.09919
>>> Subject370   -50.43206
>>> Subject371   -47.14979
>>> Subject372   -24.24770
>>>
>>> Now, what I expected is the following:
>>>
>>>    - 'Intercept' of model-2 to match with Intercept of Subject-308 of
>>>    model-1
>>>    - 'Intercept+Subject309' of model-2 to match with Intercept of
>>>    Subject-309 of model-1
>>>    - and so on...
>>>
>>>
>>> What am I missing here?
>>>
>>> If it is difficult to explain this, can you alternately answer the
>>> following: "Is it possible to define the 'lm' and 'lmer' models above so
>>> they produce the same results (at least in terms of predictions)?"
>>>
>>> Thanks again.
>>>
>>> Utkarsh Singhal
>>> 91.96508.54333
>>>
>>>
>>> On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be>
>>> wrote:
>>>
>>> > The parametrisation is different.
>>> >
>>> > The intercept in model 1 is the effect of the "average" subject at
>>> days ==
>>> > 0.
>>> > The intercept in model 2 is the effect of the first subject at days ==
>>> 0.
>>> >
>>> > ir. Thierry Onkelinx
>>> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>>> and
>>> > Forest
>>> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>> > Kliniekstraat 25
>>> > 1070 Anderlecht
>>> > Belgium
>>> >
>>> > To call in the statistician after the experiment is done may be no more
>>> > than asking him to perform a post-mortem examination: he may be able
>>> to say
>>> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> > The plural of anecdote is not data. ~ Roger Brinner
>>> > The combination of some data and an aching desire for an answer does
>>> not
>>> > ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> > ~ John Tukey
>>> >
>>> > 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
>>> >
>>> >> Hi experts,
>>> >>
>>> >> While the slope is coming out to be identical in the two methods
>>> below,
>>> >> the
>>> >> intercepts are not. As far as I understand, both are formulations are
>>> >> identical in the sense that these are asking for a slope
>>> corresponding to
>>> >> 'Days' and a separate intercept term for each Subject.
>>> >>
>>> >> # Model-1
>>> >> library(lmer)
>>> >> coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
>>> >>
>>> >> # Model-2
>>> >> coef(lm(Reaction ~ Days + Subject, sleepstudy))
>>> >>
>>> >> Can somebody tell me the reason? Are the above formulations actually
>>> >> different or is it due to different optimization method used?
>>> >>
>>> >> Thank you.
>>> >>
>>> >> Utkarsh Singhal
>>> >> 91.96508.54333
>>> >>
>>> >>         [[alternative HTML version deleted]]
>>> >>
>>> >> ______________________________________________
>>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >>
>>> >
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

	[[alternative HTML version deleted]]