[R] Linear model vs Mixed model

Wed Jul 13 20:19:16 CEST 2016

Thanks Brian for all your kind help.

"didn't mean to imply that the different parameterization of the contrasts
would make the lm estimates agree more with the lmer estimates, only that
it might be easier to compare the regression summary output to see how
similar/dissimilar they were ".
Got it now, and yes it makes sense for easy comparison.

And yes, my 'n' is indeed small. I don't want to match them exactly, but
just wanted to confirm that the underlying principle in the two methods is
not entirely different and will go away with large data. In absence of
that, I don't know how to choose between the two models. This brings to my
real problem. Take the below two models for example:

lm(y ~ x1 + x2*Subject)
lmer(y ~ x1 + x2 + Subject + (x2|Subject))

With my limited knowledge, both of these should do the following:

   - fit linear model
   - to predict 'y'
   - by capturing the effect due to predictor variables x1, x2 and Subject
   - and an interaction effect of the combination of x2 and Subject

These can of course use different optimization methods or make different
distributional assumptions, but that effect should ideally go away with
large data. So, I am not too concerned about that.

I can sleep peacefully if you just say that the difference is due to
optimization method used or distributional assumptions :).

But if it is anything more too it, then how do I decide which of the above
two models to choose in what scenario.

Regards,

Utkarsh Singhal
91.96508.54333

On 13 July 2016 at 22:56, Cade, Brian <cadeb at usgs.gov> wrote:

> Utkarsh:  I think the differences between the lm and lmer estimates of the
> intercept are consistent with the regularization effect expected with
> mixed-effects models where the estimates shrink towards the mean slightly.
> I don't think there is any reason to expect exact agreement between the lm
> and lmer estimates.  I didn't mean to imply that the different
> parameterization of the contrasts would make the lm estimates agree more
> with the lmer estimates, only that it might be easier to compare the
> regression summary output to see how similar/dissimilar they were.  But
> your way of comparing them are adequate.  Why would you think they should
> agree exactly?  Larger sample sizes might make them closer (but I don't
> know your n) but my expectation is that they could only be close not
> exactly in agreement.  The lmer intercept estimates are based on using a
> normal distribution with an estimated variance to locate the subject
> specific intercepts.  Why do you think those modeled intercepts should
> exactly coincide with your fixed effects intercepts that makes no such
> distributional assumption?
>
> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
>
> email:  cadeb at usgs.gov <brian_cade at usgs.gov>
> tel:  970 226-9326
>
>
> On Wed, Jul 13, 2016 at 10:06 AM, Utkarsh Singhal <utkarsh.iit at gmail.com>
> wrote:
>
>> Hi Brian,
>>
>> This makes some sense to me theoretically, but doesn't pan out with my
>> experiment.
>>
>> The contrasts default was the following as you said:
>> > options("contrasts")
>> $contrasts
>>         unordered           ordered
>> "contr.treatment"      "contr.poly"
>>
>> I changed it as follows:
>> > options(contracts=c(unordered="contr.sum", ordered="contr.poly"))
>>
>> But the output of 'lm' model was exactly the same as before.
>>
>> Then I tried the following:
>> > coef(lm(Reaction ~ Days + Subject, sleepstudy,
>> contrasts=list(Subject="contr.sum")))
>>
>> The model output appears slightly different, but the 'Subject* +
>> Intercept' is still exactly the same as the default 'lm' model.
>>
>> Also, as you can verify below, the predictions from two 'lm' models are
>> exactly the  same
>>
>> > fit_lm = lm(Reaction ~ Days + Subject, sleepstudy);
>> > fit_lm2 = lm(Reaction ~ Days + Subject, sleepstudy,
>> contrasts=list(Subject="contr.sum"))
>> >
>> > head(predict(fit_lm))
>>        1        2        3        4        5        6
>> 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
>>
>> > head(predict(fit_lm2))
>>        1        2        3        4        5        6
>> 295.0310 305.4983 315.9656 326.4329 336.9002 347.3675
>>
>> And these are quite different from the mixedmodel:
>>
>> > fit_lmer = lmer(Reaction ~ Days + (1| Subject), sleepstudy)
>> > head(predict(fit_lmer))
>>        1        2        3        4        5        6
>> 292.1888 302.6561 313.1234 323.5907 334.0580 344.5252
>>
>> Any idea?
>>
>>
>>
>> Utkarsh Singhal
>> 91.96508.54333
>>
>>
>> On 13 July 2016 at 00:18, Cade, Brian <cadeb at usgs.gov> wrote:
>>
>>> Your lm() estimates are using the default contrasts of contr.treatment,
>>> providing an intercept corresponding to your subject 308 and the other
>>> subject* estimates are differences from subject 308 intercept.  You could
>>> have specified this with contrasts as contr.sum and the estimates would be
>>> more easily compared to the lmer() model estimates.  They are close but
>>> will never be identical as the lmer() model estimates are based on assuming
>>> a normal distribution with specified variance.  They rarely would be
>>> identical.
>>>
>>> Brian
>>>
>>> Brian S. Cade, PhD
>>>
>>> U. S. Geological Survey
>>> Fort Collins Science Center
>>> 2150 Centre Ave., Bldg. C
>>> Fort Collins, CO  80526-8818
>>>
>>> email:  cadeb at usgs.gov <brian_cade at usgs.gov>
>>> tel:  970 226-9326
>>>
>>>
>>> On Tue, Jul 12, 2016 at 12:10 PM, Utkarsh Singhal <utkarsh.iit at gmail.com
>>> > wrote:
>>>
>>>> Hello Thierry,
>>>>
>>>> Thank you for your quick response. Sorry, but I am not sure if I follow
>>>> what you said. I get the following outputs from the two models:
>>>> > coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
>>>> Subject    (Intercept)     Days
>>>> 308    292.1888 10.46729
>>>> 309    173.5556 10.46729
>>>> 310    188.2965 10.46729
>>>> 330    255.8115 10.46729
>>>> 331    261.6213 10.46729
>>>> 332    259.6263 10.46729
>>>> 333    267.9056 10.46729
>>>> 334    248.4081 10.46729
>>>> 335    206.1230 10.46729
>>>> 337    323.5878 10.46729
>>>> 349    230.2089 10.46729
>>>> 350    265.5165 10.46729
>>>> 351    243.5429 10.46729
>>>> 352    287.7835 10.46729
>>>> 369    258.4415 10.46729
>>>> 370    245.0424 10.46729
>>>> 371    248.1108 10.46729
>>>> 372    269.5209 10.46729
>>>>
>>>> > coef(lm(Reaction ~ Days + Subject, sleepstudy))
>>>> (Intercept)  295.03104
>>>> Days          10.46729
>>>> Subject309  -126.90085
>>>> Subject310  -111.13256
>>>> Subject330   -38.91241
>>>> Subject331   -32.69778
>>>> Subject332   -34.83176
>>>> Subject333   -25.97552
>>>> Subject334   -46.83178
>>>> Subject335   -92.06379
>>>> Subject337    33.58718
>>>> Subject349   -66.29936
>>>> Subject350   -28.53115
>>>> Subject351   -52.03608
>>>> Subject352    -4.71229
>>>> Subject369   -36.09919
>>>> Subject370   -50.43206
>>>> Subject371   -47.14979
>>>> Subject372   -24.24770
>>>>
>>>> Now, what I expected is the following:
>>>>
>>>>    - 'Intercept' of model-2 to match with Intercept of Subject-308 of
>>>>    model-1
>>>>    - 'Intercept+Subject309' of model-2 to match with Intercept of
>>>>    Subject-309 of model-1
>>>>    - and so on...
>>>>
>>>>
>>>> What am I missing here?
>>>>
>>>> If it is difficult to explain this, can you alternately answer the
>>>> following: "Is it possible to define the 'lm' and 'lmer' models above so
>>>> they produce the same results (at least in terms of predictions)?"
>>>>
>>>> Thanks again.
>>>>
>>>> Utkarsh Singhal
>>>> 91.96508.54333
>>>>
>>>>
>>>> On 12 July 2016 at 19:15, Thierry Onkelinx <thierry.onkelinx at inbo.be>
>>>> wrote:
>>>>
>>>> > The parametrisation is different.
>>>> >
>>>> > The intercept in model 1 is the effect of the "average" subject at
>>>> days ==
>>>> > 0.
>>>> > The intercept in model 2 is the effect of the first subject at days
>>>> == 0.
>>>> >
>>>> > ir. Thierry Onkelinx
>>>> > Instituut voor natuur- en bosonderzoek / Research Institute for
>>>> Nature and
>>>> > Forest
>>>> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>>>> > Kliniekstraat 25
>>>> > 1070 Anderlecht
>>>> > Belgium
>>>> >
>>>> > To call in the statistician after the experiment is done may be no
>>>> more
>>>> > than asking him to perform a post-mortem examination: he may be able
>>>> to say
>>>> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>> > The plural of anecdote is not data. ~ Roger Brinner
>>>> > The combination of some data and an aching desire for an answer does
>>>> not
>>>> > ensure that a reasonable answer can be extracted from a given body of
>>>> data.
>>>> > ~ John Tukey
>>>> >
>>>> > 2016-07-12 15:35 GMT+02:00 Utkarsh Singhal <utkarsh.iit at gmail.com>:
>>>> >
>>>> >> Hi experts,
>>>> >>
>>>> >> While the slope is coming out to be identical in the two methods
>>>> below,
>>>> >> the
>>>> >> intercepts are not. As far as I understand, both are formulations are
>>>> >> identical in the sense that these are asking for a slope
>>>> corresponding to
>>>> >> 'Days' and a separate intercept term for each Subject.
>>>> >>
>>>> >> # Model-1
>>>> >> library(lmer)
>>>> >> coef(lmer(Reaction ~ Days + (1| Subject), sleepstudy))
>>>> >>
>>>> >> # Model-2
>>>> >> coef(lm(Reaction ~ Days + Subject, sleepstudy))
>>>> >>
>>>> >> Can somebody tell me the reason? Are the above formulations actually
>>>> >> different or is it due to different optimization method used?
>>>> >>
>>>> >> Thank you.
>>>> >>
>>>> >> Utkarsh Singhal
>>>> >> 91.96508.54333
>>>> >>
>>>> >>         [[alternative HTML version deleted]]
>>>> >>
>>>> >> ______________________________________________
>>>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >> PLEASE do read the posting guide
>>>> >> http://www.R-project.org/posting-guide.html
>>>> >> and provide commented, minimal, self-contained, reproducible code.
>>>> >>
>>>> >
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>

	[[alternative HTML version deleted]]